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Course Overview 


0.1 Outline 


Linear algebra: the study of linear equations, vectors, matrices, and vector spaces. Let’s 


discuss these informally — we’ll go over these ideas more precisely later. 


1. Linear algebra begins as the study of linear equations. A linear is a sum of variables 
with coefficients, like 


2c — 3yt+ 42=4. 
This is a simple type of equation, the kind with which you have the most familiarity 
— it is an equation whose graph is straight: a line or a plane. 


These are the most simple equations around, and the only ones that we really know 
about. The lesson of calculus is that if a function is locally linear, i.e., smooth under 
a magnifying glass, then we can study it. That is, if a function is differentiable, we 


can study it with calculus. 


A solution to 
2a — 3y + 2 =4 
is a choice of x,y, z that makes each equation true, like x = 6,y = 3, z = 4. 


The next step is to consider systems of linear equations (that is, a group of a few 


linear equations). 


Can they all be solved simultaneously? Is the solution unique? The answers depend 


on the system under consideration: not all can be solved, and some have many 
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solutions. A solution to a system like 


2x —3yt+4z=4 
——2y+3z=3 


z+9y-—z=0 


is a choice of x,y,z that makes each equation true simultaneously. The solution 
also has a geometric interpretation: each of those equations is a plane in 3-space. A 
solution to this set of equations is a point in space where all three planes intersect. 


Why might such a thing not exist? How many ways can 3 planes intersect? 


(Examples with 3 sheets of paper) 


. Matrices are a natural way to write and solve systems of linear equations. At first, 


matrices seem like just a notational convenience. However, after working with them, 
one discovers that they allow for efficient computation. After working with them 
even more, you start to see how properties of the matrices give information about 


the system that might not be initially apparent. Example: determinants. 


. Next, one is led to consider vectors, that is, objects which must be described 


in terms of more than one coordinate or components, like (2, —3, i): The most 


natural examples are points in the plane or in space. Vectors allow one to describe 


multidimensional phenomena, and so are inherently adapted to describing geometry. 


. In order to study vectors, it is often helpful to consider the collection of all vectors 


of a certain type. Examples: 


R? = {(2,y) : a,y are real numbers} 
R? = {(2,y, 2) i 2,y,z are real numbers} 


C? = {(a,y) : 2, y are complex numbers}. 


. Abstracting vectors, one can describe them solely in terms of their properties. For 


example, if you add two vectors, you get another vector (and it doesn’t matter which 
one came first). This leads to the notion of a vector space, that is, the collection of 


all vectors of a certain type. Examples: the three just above. 
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How about 


{(x,y, z) | x,y,z are integers}? 


{(x,y,z) | @,y,z are positive}? 


More interesting examples: 


{ f(x) : f is a polynomial in x} 
{ f(x): f is a continuous function} 


{ f(x): f is a differentiable function}. 


Also, we discover the essential notion of a basis. A basis is a toolkit containing the 
building blocks of your entire vector space. That means that any element of your 
vector space (i.e., any vector) can be built up out of basis elements. It is easy to 


provide an example of a basis of R?: 
(1,0,0), (0,1,0), (0,0,1). 
Here is an example of an expansion in terms of this basis: 


What is a basis for the space of polynomials? {1, 2, x7,x3,...}. 


What is a basis for the space of continuous functions? This gets a bit more tricky, but 
there are many good answers. One of the classical answers to this question consists 
of functions that look like acos 6 + bsin @, etc. This is the starting point of Fourier 
analysis and is what almost all digital communication and information compression 


technology is based upon. 
Wavelets are a newer answer to this question. In fact, this is still an area of current 


research. 


6. If we add an operation (called inner product) to a vector space that gives us the 


angles between two vectors, then we can tell when two vectors are perpendicular 
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(orthogonal), etc, and study geometry in any dimension. 


It turns out that an inner product also gives a very good way to determine how 
far apart two points are in a vector space, and how to approximate one element by 


others. 


. All this study of vectors gives a different perspective on matrices. A matrix can be 


thought of as a transformation of vectors. (Specifically, a linear transformation.) How 
can one characterize such a transformation? Eigenvectors tell which directions remain 


fixed, and eigenvalues tell how much things are stretched in these fixed directions. 


There are innumerable applications of eigenvectors and eigenvalues, for example, 
long-term behavior. Suppose you have a transition matrix: the eigenvalues will tell 


you the long-term distribution of values. Example: soda pop sales projections. 


. We will continue to study linear transformations by examining the kernel (the set 


of vectors that gets killed, i.e., sent to 0) and the range. Can a transformation be 


inverted? Sometimes, but not always; the determinant knows the answer. 


. Finally, we'll discuss various applications as time permits. The previously mentioned 


topics have uses in geometry, differential equations, data analysis, signal processing 
and the approximation of functions in general, economics, business, electrical net- 
works, optimization, computer graphics, probability, game theory, fractals and chaos, 
quantum mechanics, and generally any other area where you have multiple players, 


goods, particles, etc. 


The set of complex numbers C (aka the complex plane) is basically R? with multiplica- 


tion. Enroll in Complex Analysis if you want to discover the amazing and far-reaching 


consequences of endowing points in R? with a new operation (multiplication) defined by 


(u,v) + (x,y) = (ux — vy, uy + va). 


0.2 Preliminaries and reference 


0.2.1 Common notations and terminology 


Sets will be defined by listing their elements or providing the criteria to be a member. 
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Example 0.2.1. 


A= {0,2,4,...} (pattern) 
B={x:27=1} (algebraic req) 
OC feta Si} 


D= {x:|x—y| < 1} (geometric req) 


Definition 0.2.2. If x is an element of the set A, we write x € A; if not, we write x ¢ A. 
If every element of the set A is also an element of the set B, then A is a subset of B and 


we write AC B. 


Example 0.2.3. If y = 0 in the example above, then we have an interval 
D=([-1lj={#eR: -l<ae<1}, and BC D. 


Definition 0.2.4. If A C B and B C A, then the sets are equal and we write A = B. 
@ = {}, the empty set; it contains no elements and is a subset of every other set. 
N = {1,2,3,...}, the natural numbers 
Z = {...,—2,—-1,0,1,2,...}, the integers (“Zahlen” in German) 
Q={7im,n€ Z,n FO}, the rational numbers ( “quotients” ) 

R =Q, the real numbers 


R? = {(z,y) :z,y € R}, the plane 


R? = {(2,y,z): x,y,z € R}, 3-space 


R” = {(@1,%2,...,%): each x; € R}, (Euclidean) n-space 
Set operations: 
intersection: AN B={x:xe Aand az e€ B} union: AUB={a:¢¢€ Aorze B} 


complement: A° = {xia ¢ A} difference: A\ B={xa:2€ Aanda¢ B} =AN B® 


product: Ax B= {(x,y):2€ A and y € B} containment: AC B => (t#eE A = we B) 
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Subsets of R: 


[a,b] ={x:a<a <b} isa closed interval [a,b]? = [a,b] x [a,b] = {(a,y)'a<-a,y <b} 
(a,b) = {xi a< a <b} is an open interval 


(a, b] or [a,b) are half-open intervals 


Some common vector spaces (each has an analogue with R replaced by C): 


R? = {(2,y):2,yER}=RxR 


R? = {(x,y,z) :a,y,z€R}=RxRxR 


R” = {(@1,22,...,%) ia; € R fori=1,...,n} Gy) =x-y =) win: 
C(X) ={f :X >R: f is continuous} = C°(X) (f,g) = [ sea) dx 
C’(X)={f:X oR ff’, f"....,f™ are continuous} 


Minn = {m x n matrices with entries from R} 


P, = {p(t) = >" 


pay OE i ER for i=1,...,n} 


Inner product (dot product) properties: 


\|x|| = \/(x,x) = the length of x |x — y|| = distance from x to y 
(x,y) = [[xl|llyll cos, so |(x, y)| < |Ixllllyll| (x,y) =O <> xly 


0.2.2 Logic and inference 


“A implies B” is written A => B and means that if A is true, then B must also be true. 
This is if-then or implication. A is the hypothesis and B is the conclusion. To say “the 


hypothesis is satisfied” means that A is true. In this case, one can make the argument 
yp g 


A = B 


and infer that B must therefore be true, also. Logical equivalence: when A => B 


and B => A, then the statements are equivalent and we write “A if and only if B” as 
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xi 


A <= B,A=B, or A iff B. 


— 


Equivalent forms of an implication: 


DeMorgan laws: 


(A and B) 7Aor-7AB 


A B)A => B | -7(Aand7B) | -AorB | AB = 7A 
T T T T T ay 
T F F F F F 
F T T T T T 
F F T T T T 


“(Aor B) 7A and =B 


7 es i 
dH Hw 


Set version of the DeMorgan laws: 


F 


T 
T 
T 


Distribution laws for sets: 


(AU B)N (AUC). 


F 


T 
T 
T 


F 


F 
F 
T 


F 


F 
F 
T 


AN(BUC)=(ANB)U(ANC), and AU(BN 


Containment law (set version of contrapositive): 


AN B=. 


A 


(AN B)¢ = ACUB?, and (AUB)°= ANB. 


C) = 


CB Be C AS 


Universal quantifier: Vz, A(z) means A(z) is true for all values of x. 


Existential quantifier: 


Note: S!z, A(z) means there is a unique x for which A(z) is true. 


Quantifier rules: 


dr, A(x) = “V2, A(x) 
dae, A(x) = Va,7 A(z) 


Vx, Vy, A(x, y) = Vy, Va, A(z, y) 


ma 0 


o] 


0.2.3 Proof Techniques 


How to prove a statement of the form A => B. 


Ay, A(x, y) = Sy, Se, A(a,y) 


da, Vy, A(z,y) => Vy,F 


x, A(x) means A(x) is true for some «x (at least one, anyway). 


x, A(z, y) 
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Direct proof: 
1. Assume the hypothesis A, for the moment. 


2. Use this assumption, and whatever else you know, to prove that B is true. 


Indirect proof: 
Use the fact that A => B is equivalent to ~B = > —A (contrapositive). 
1. Assume, for the moment, that the opposite of the conclusion, —B, is true. 


2. Use this assumption, and whatever else you know, to prove —A is true. 


Contradiction: 


This works for proving statements that are not necesarily of the form A = > B. Suppose 


you are asked to show that some proposition P is true. 
1. Assume, for the moment, that P is false. 


2. Show that this assumption implies a fallacy (like x < x, “9 is prime”, or some other 


blatant lie). 


Mathematical induction: 


This works for proving statements which are supposed to be true for every natural number. 


To prove that P(n) is true whenever n € N: 
1. Show P(1). 


2. Show that P(k) => P(k +1). 


Chapter 1 


Systems of Linear Equations and 


Matrices 


1.1 Introduction to Systems of Linear Eqns 


Definition 1.1.1. A linear equation in n variables is 
A121 + Ag%2 + +++ + andy = b, 


where a;,6 are real numbers and the x; are unknown. 


Example 1.1.2. 
A linear equation in 2 variables looks like ax, + bx2 = d where a,b, d are constants and 


1,22 are the two variables: 
et. =27,4+1 
e 3a, —% =4 
e 24, +%2-1=0 


A linear equation in 3 variables looks like ax; + baz + cx3 = d where a,b,c,d are 


constants and x1, 22,23 are the three variables: 
e 0.52; — 3%2 + r3 = 2 


e 43 —2%2+3=21 
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2 3 =_ 
e gti — qr2 = 3 


In general, a linear equation in n variables looks like 
A121 + 49%q +.a3%3+...+AnXn = D 


where @1, 42,03,...,@n,6 are constants and 21, %2,%3,...,%n are n variables. 


For contrast, here are some examples of equations that are not linear: 


%1%2=1 The variables are multiplied together. 


a 


zy = 1  Reciprocals are not linear (equivalent to above). 


—2xr; + 273+3x3=1 Raising any variable to a power produces a nonlinear eqn. 
t2=sinz, Trigonometric functions are not linear. 


e738 — 3x2 =0 The exponential function is not linear. 
General rule: linear iff graph is flat. 


1.1.1 Systems of Linear Equations 


Definition 1.1.3. A system of linear equations (or linear system) is simply a collection 


of two or more equations which share the same variables. 


Example 1.1.4. Suppose you have a collection of dimes and nickels worth 80 cents, and 


you have 11 coins total. The associated system of linear equations is 


102, + 522 = 80 


Uy+ r=11 


Solution by back-substitution: the second equation may be rewritten as 2 = 11—2 . This 


new expression for x2 may be substituted into the first equation to produce 
1021 + 5(11 — X1) = 80, 


which then gives 


5a, = 25 => v1 =5 and zg = 6. 


Alternative: multiplying the second equation by —10 gives 


1021 +5 t2 = 80 


10x, 10x2 = —110 
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= —5 22 = —30 


which also gives x2 = 6 and x; = 5. 


Definition 1.1.5. A solution to a system of linear equations is sequence of numbers 
$1, $2,---, Sn such that the system of eqns is satisfied (i.e., true) when s; is substituted in 


for x;. In the previous example, 


10-5+5-6 = 80 


5+ 6=11 


shows that (#1, v2) = (5,6) is a solution to the system. Geometrically, a solution is a point 
where all the graphs intersect. 

A solution set for a system of linear equations is the set of all possible solutions for the 
system. A system with no solutions is said to be inconsistent; a system with at least one 
solution is consistent. 

A system is homogeneous all constants b; are equal to 0. A homogeneous system always 


has the trivial solution 41 = 22 =...%p,) = 0. 


This last definition might prompt you to ask, “How many solutions can a system of 
linear eqns have?” Intuitively, you might expect that every system has exactly one solution, 


but this is not the case. Consider the following systems: 


Example 1.1.6. 


@1+2%Q= 2 


Ly — La= 2 


This system represents two lines which intersect at the point (2,0). Hence, it has the 


unique solution (2, 0). 


Ly, tQ= 2 


21+ LQ> 1 


This system represents two parallel lines. Since these lines do not intersect, there is no 


solution (s1, $2) which satisfies both equations simultaneously. More intuitively, think of 
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this system as being impossible to solve because two numbers cannot sum to two different 


values. 


tata =2 


—X%1— LQ2>= —2 


This system represents the same line two different ways. Since these two lines overlap each 
other, any point on one line is also on the other. Hence, any point on the line is a solution 


to the system. 


1.1.2 Solution sets 


In the last example, we saw a system with an infinite solution set (any point on the line 


will work!). How to express this? 


Example 1.1.7. Consider the linear system 


Gy. - 279 = 323 =—4 (1.1.1) 


221 +r 323 = 4 (1.1.2) 


What’s the first thing you notice about this system? It has two equations, and 3 unknowns. 
So can we still solve it? Well, mostly ... 

Begin by eliminating x, by multiplying (1.1.1) by —2 and adding it the second equation 
to obtain 


— 322 + 323 = 12. (1.1.3) 


Now solve 1.1.3 for x2 as 


Since this is about as far as we can go in solving this system, we let x3 = t, where t is a 
parameter that can be any number, i.e., t € R or —co < t < co or t € (—co, 00). Now by 


substituting 73 = t into (1.1.4), we get r2 =t — 4. Now we rewrite equation (1.1.1) as 


a = —-4—2704+ 3273 
= —4—2(t— 4) +38 


=t+4 
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and we obtain the solution set (t+ 4,¢—4,t), where —oo < t < oo. Note that there are an 
infinite number of solutions, but not just any three numbers (a,b,c) is a solution of the 


system. A solution needs to have the specific form (f+ 4,t — 4,t). 


Definition 1.1.8. A parameter is a variable, usually with a specified range, which remains 
as part of the solution; the solution set is then said to be parameterized. 

An infinite solution set which is described in terms of a parameter is also called a 
parametric representation. A variable which has been set equal to the parameter is called 


a free variable. 


A parametric representation is not unique; it can be written many ways. For example, 


the parametric solution to the system above may also be written as: 


(r,r—8,r—4),-co<r<o x1 is a free variable. 
(s+8,s,s+4),-c<s<oo x is a free variable. 
(u+2,u—6,uw—2),-co<u<co No free variable. 


For systems with more variables, the solution set may have many parameters. A 
particular solution can be obtained from a parameterized solution by substituting in 


certain values of the parameters: 
r=1 = (,-7,-8). 


Fixing s = —7 gives the same point in R®. 


This example serves to illustrate the general case: for any system of linear equations, 
it is always the case that there is either one unique solution, no solution, or an infinite 
number of solutions. In other terminology, the solution set can consist of one point, it can 
be empty, or it can contain infinitely many points. This is due to the nature of straight 
lines and the ways they can intersect. For example, it is impossible for two straight lines 


to intersect in precisely two places (in flat space). We’ll prove this later on. 


HW] §1.1: 1, 2, 9, 10, 10, 16, 18, 26, 27, 30, 31 
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1.2. Matrices 


Definition 1.2.1. A matrix is a rectangular array of numbers. An m x n matrix is a 


matrix with m rows and n columns: 


441 412 4130 * Gin 
a21 a22 a230 °°" a2n 
431 432 433 ‘*' 43n 
Gm1 Am2 Gm3 °*"° amn 


Definition 1.2.2. Each entry aj; in the matrix is a number, where 7 tells what row the 
number is on, and 7 tells which column it is in. For example, a23 is the number in the 
second row and third column of the matrix. The subscripts 7, 7 can be thought of as giving 


the “address” of an entry within the matrix. 


Definition 1.2.3. Two matrices are equal iff they have the same size (equal dimensions) 


and all of their entries are equal, so aj; = bj; for all 7, 7. 


Definition 1.2.4. If we have an m x n matrix where m = n, then it is called a square 
matrix. For a square matrix, the entries a1), @22,...,@nn are called the main diagonal or 


sometimes just the diagonal. 


Remark. We will discuss how to perform arithmetic operations with matrices shortly, that 
is, how to add two matrices together or what it might mean to multiply two together. 
First, however, we will apply matrices to the task of solving linear systems, and develop 


some motivation for why matrices might be important. 


1.2.1 Types of matrices 


Definition 1.2.5. A matrix with only one row is called a row vector. E.g.: 
a= [aj a2 ... Gn] 


b= (3 -1119] 


1.2 Matrices 


A matrix with only one column is called a (GUESS WHAT) column vector. E.g.: 


3 
ay 
—2 
a2 
a= b=] 6.8 
—1 
an 
0 


Definition 1.2.6. A matrix with the same number of rows as columns is a square matriz. 


Both of the first two examples were square. 


1.2.2 Matrix operations 


Definition 1.2.7. If two matrices A and B are both of the same size, then we define the 


sum of A and B as follows: 


ai a12  *** Gin bi1 big bin 
a21 agg, °°" a2n b21 bog + bon 
a — 
Aml1 aAm2 ae Amn bmn bm2 pie bmn 
ayy t+ by, a2 + big ted Qin + bin 
az, + bay a22 + beg ma? Gan + ban 
Ami 1 bm Am2 7 bm2 ty Qmn 1 bmn 


Remark. This is probably a good time to introduce some shorthand notation for matrices. 


In future, we may write the matrix 


Q414 41200" Gin 

a21 a22 a2n 
A = 

aml aAm2 “wo Amn 


in the abbreviated form 
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In this notation, the sum of two matrices A = [a;,;], B = [b;;] is written 


A+B= [ais] + [bi 5] _ [ax + bis] é 


Example 1.2.8. For 


the sum is given by 


(0: S049) aw 10 0 
A+B= = 
ae aes toes ee 32. A 


Note that this definition only makes sense when A and B are the same size. If 


two matrices are of different size, then their sum is undefined. 


Definition 1.2.9. Scalar multiplication (or “multiplication by a number”, or “multiplica- 


tion by a constant”) of a matrix 


Q11 a12 -"° Gin 
a21 agg. 0¢°° a2n 
Ai)! -y bes | = las] 
AmI1 Am2 “ee Amn 
by a scalar c is defined by 
Q11 (2 es Gin C41 Ca12 °°" Cain 
a21 a22 °°" a2n Ca21 Ca22 °°" CO2n 
cA=c} 0, ce Set g A c. . | = elaig] = [eas] 
aAm1 aAm2 ae Amn Cam1 Cam2 ase Camn 


6 8 10 


1.2 Matrices 


then two scalar multiples of it are 


‘ 2 4 1 12 24 6 
3°04 ~=5 18 24 30 


Scalar multiplication is good for changing the size of entries in a matrix/vector so as to 


have certain properties. 


Example 1.2.11. Let A = [3 7 2 1]. Then for c = 7, the largest entry of cA is 1. For 
b = 


d, 


SE7ESET = jg, the entries of bA are percentages (or probabilities) associated to A. 


If d = 2.54 and the entries of A are measurements in inches, then dA gives the same 
measurements in cm. 
Now you can take sums and differences of matrices. More generally, 


Definition 1.2.12. A linear combination of matrices A,, Ag,..., An is an expression 
n 
C1 Ay + c2Ag + +++ + Cn An = S- ci Aj, 
i=1 


where each c; is a real number (coefficient). 
Later, we’ll see that linear combinations of vectors are especially useful. 


Here )>;"_, x; is the standard summation notation for the sum of n (different) things. 


For example, if x = (3,2,1,4,7), then 


4 
Soa =24+144=6. 


1=2 


Sums have the following properties: 


1 ) aj(ujyu;) = ) ajyu;y + ) AjV; 
i=l al i=1 
n nm 


The transpose of A is denoted by A? and has entries al, = a;;. So the rows of A are 
y 1 J 


the columns of A” and vice versa. 
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ie LF 
A=|4 5 6 => AF=|2 5 8 
7 8 9 3.6 9 


Every matrix has a transpose; the matrix need not be square. 


1 4 
i 2 23 a 
B= = BT=|2 5 
4 5 6 
3 6 


The transpose of a row vector is a column vector, and vice versa. 


Vectors as data storage 


Suppose you own a store that sells 100 different products. The inventory of the store 
is then a vector 2 € R!°. Say u € R!° is your inventory at the beginning of the week, 
v € R'° tells how many of each item was sold in the week, and w € R'° tells how many 
items arrive on the truck with this week’s delivery. Then at the end of the week, your 


inventory is u—vu+w. 


Vectors can also store relational data. A graph is a collection of vertices (nodes/points) 
and edges (line segments showing connectivity/adjacency). The adjacency matrix of a 


graph encodes this data with a 1 in the (i, 7)" entry if P; ~ P; and a 0 otherwise. 


Pi Py Ps Py 


Co F CO F&F 
Fe CO FF OC 
Sr A SE 


mo 
eS Oo ie ie 


HW] §1.2: 7(df), 9(cd), 10, 12, 14, 17, 19, 20, 21 
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1.3. Matrix multiplication 


Definition 1.3.1. The dot product of x and y is 


i=l 
Example 1.3.2. u = (0.31, 0.23, 0.23, 0.23) and v = (96,87, 43,81). Then 


u-v = (0.31)(96) + (0.23)(87) + (0.23)(43) + (0.23)(81) = 78.29 


Example 1.3.3. Let a = (#,2,3) and b = (4,1,2). Ifa-b = —4, what is x? 


a-b=474+2+6=-4 4x = —12 —— a aoe 


Definition 1.3.4. The product of two matrices A = [a;;] and B = [b;;] is only defined 
when the number of columns of A is equal to the number of rows of B. Suppose A is an 
m Xn matrix and B is an n x p matrix so that the product AB is well-defined. Then AB 
is defined as follows: 


AB= [cis] where Cj = a b; = Sa 


AnD; - 
k=l tk¥kj 


hideous and slightly terrifying, you should not be alarmed. In practice, the entries of 
a product are not too difficult to compute, and there is a very simple mnemonic for 
remembering which entries from the factor matrices are used: t To find the entry in the 
ith row and jth column of the product, use the ith row of A and the jth row of B. Using 


full-blown matrix notation, we have 


a1 G2 413 °°: Gin Cit C12 Cage Clp 
bir bie bi; Dip 
a21 422 23 °°: a2n C21 C22, Cag cc C2p 
boy bog - > bo; aoe bop 
b31 ge b33 b3p = , 
ail aj2 aiz3 Qin Ci Ci2 Cij Cip 
Dnt bn2 bnj Onp 
L Am1 Gm2 m3 Gmn J Cmi1 Cm2 Cmj Cmp J 
where 
oo a s ‘a a eal ¥ Shel | : 
Ci = (AB);; = ajb1; T aj2b9; T 04363; Pee. Tt Ginbn;- 


You can see why A must have the same number of columns as B has rows - otherwise 


these numbers would not match up equally, and the product wouldn’t be well-defined. 
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Example 1.3.5. Consider the matrices 


3-2 
—2 1 3 
A=] 2 A and B= 
4 1 6 
1 -3 


since A has 2 columns and B has 2 rows, the product of these two matrices is well-defined: 


3-2 
—2 1 3 

AB=]| 2 4 
4 16 

1 -3 


Bf) Set Oat Bag 20g 
‘A 2144-1 2.384426 
jet (a) joer Se Re eee 2 en 


II 
i) 
1 
bo 
wa 
+ 
ys 


—14 Ii 3 
= 12 6 30 
-14 -2 —-15 


Note that B has 3 columns and A has 3 rows, so the product BA is also defined: 


—2 
—2 1 3 

BA= 4 
4 1 6 

—3 


DiS OS 285 (Oh TS) 
L344 OG “def E ted G5) 


ce et 
20 —22 


This example illustrates a very important point: when we multiply matrices, AB is not 


necessarily equal to BA. In fact, they need not even have the same size! 


Example 1.3.6. Let 
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Then we can find the product 


dL. 2 5 68 
3.4 

BA=] 4 5 =| 17 26 
1 2 

6 15 24 


because B has 2 columns and A has 2 rows. However, the product AB is not even defined! 


Note that in general, the product matrix gets its height from the first matrix and its 


width from the second. 


Definition 1.3.7. The coefficient matrix of a system of linear equations is the matrix 


whose entries a;; represent the coefficient of the 7th unknown in the ith equation. 


Example 1.3.8. Given the linear system 


| 
wo 


1 +2%2+ 43 = 
321 SS oS 323 =-—1 


2%, +3%2+ r3= 4 


which we solved previously, the coefficient matrix of this system is 


1 2 1 
3-1 —-3 
2 3 1 


Definition 1.3.9. The augmented matriz of a system of linear equations is like the 


coefficient matrix, but we include the additional column of constants on the far right side. 


Example 1.3.10. The augmented matrix of the system given above is 


1 2 1 3 
3-1 -3 -1l 
2 3 1 4 
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Sometimes augmented matrices are written with a bar to emphasize that they are augmented 


matrices: 


1 2 1 3 
3-1 —-3}]-1 
2 3 1 4 


Example 1.3.11. Note that any term which is missing from an equation (in a system of 


linear equations) must be represented by a 0 in the coefficient matrix: 


@1—%+2r3= 1 Lr le al th 
iD) 3 = 3 0 1 -1/]-3 
At3 = 2 0 0 4 2 


1.3.1 Column-by-column, row-by-row 


Above, we saw how to compute just one entry of a product matrix. We can also compute 


just one column, or just one row. 


jcol of AB = A[j*® col of B] 


i row of AB = [i*® row of AJB 


This means you can compute AB column-by-column as 
AB = Alb; bz ... by] = [Ab Aba ... Abn] 


where b; is a column of B, or else row-by-row as 


An AnB 


where a; is a row of A. 
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Let 
ai eee Gin Ly 
A= : i 
Oth ks Onn Le 
Then 
A121 ++ + Ainin a1 Qin 
Ani Ly +++ + Ann&ln anl1 ann 


This shows two nifty things. 

(1) A matrix times a vector is a linear combination of vectors. More specifically, the 
product Ax can be represented as a linear combinations of the columns of A, where the 
coefficients are the entries of x. 


(2) A system of linear equations can be encoded in matrices. Let 


by 
b — 
bn 
Then 
G11 +++ Gln ZY Q11%1 res T Ainin by 
Ax = = = = b 
anl tee ann In An1®1 T** TaAnntn bn 


shows that Ax = 0 is the same thing as the system 


A440, +++ $A Ty = by 


AniX1 +++ + Anntn = bn. 


HW] §1.3: 6, 7, 13, 15, 19, 33, 34, 36, 43, 48, 49 
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1.4 Algebraic properties of matrix operations 


Remark. Mathematical thought: study an object until one can extract the salient features 


of the system and develop rules about how they interact. 


1. First: basic arithmetic is learning specific sums/times tables. 


2. Later: properties of numbers in algebra class. 


Now, we do the same with matrices. 


1. Algebra is the distillation of properties of numbers and how they behave with respect 


to the operations of addition and multiplication. 


2. Linear Algebra is the distillation of properties of matrices and how they behave under 


addition and multiplication. 


(Also, other operations unique to matrices.) 


Algebraic Properties of Scalars 


Commutative 
Associative 
Identity 


Inverses 


additive 
at+b=b+a 
a+(b+c)=(a+b)+c 


fora # 0d!lbs.t. a+b=a 


fora £ 0Albs.t. b+a=0 


Mixed additive/multiplicative properties: 


Distributive a(b+c) = ab+ac 


Zero a:-0=0 


multiplicative 
ab = ba 
a(bc) = (ab)c 


dlos.t.a-b=a (b=1) 


Jb st. b-a=1 (b=4) 


(a+ b)c = ac+ be 


ab 


a=O0orb=0 


Even if the names are not familiar, the properties are. Now contrast this with the rules 


governing matrices (c,d are scalars, A, B,C are matrices): 


Matrix Identities 


What are identities for matrices? 
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e Additive identity: 


0 0 
0 0 
Omn = 
0 0 0 
e Multiplicative identity: 
| ic Oc oO ] 
0 1 0 
In = 
0 0 1 


Theorem 1.4.1. Jf R is a square matrix in reduced row-echelon form, then R either has 


a row of zeros, or else R is the identity matriz. 


Algebraic Properties of Matrices 


additive multiplicative 
Commutative A+B=B+A AB#BA 
Associative A+(B+C)=(A+B)+C A(BC) = (AB)C 
Identity JIBst. A+ B=A, B=Omn JBst. AB=A, BHI, 
Inverses !Bs.t. A+ B=0mn, B=-1-A (sometimes)3!B st. AB=I,, B= A! 


Mixed additive/multiplicative properties: 


Distributive A(B+C) = AB+ AC, (A+ B)C = AC+ BC 


Zero A0mn =Omnn, AB =0mn # A=Omn or B= O0mn 


Mixed scalar/matrix properties: 


Associative (cd)A =c(dA), A(cB) = c(AB) = (cA)B 


Distributive c(A+B)=cA+cB, (c+d)A=cA+dA 


Zero cA Ois > Cc Oor A Omn 


18 


Linear systems 


Note the special cases: 
(1) matrix multiplication is NOT commutative, 
(2) multiplicative identity is only defined for SQUARE matrices, 
(3) multiplicative inverses do NOT always exist, and 
(4) there ARE zero-divisors (so cancellation laws fail). 


Example 1.4.2. Example of (1). 


0 1 1 1 3 
A= B= , AB= #BA= 
0 2 3 2 6 
Example 1.4.3. Example of (4). Suppose 
2 - 1 -l 
A= , and B= 
2 -1 2 -—2 
so that we have 
2 -1 1 -1l 0 0 
AB= = = 022 
2 -1 2 -2 


Remark. It is precisely because of this last fact that the familiar Law of Cancellation does 


NOT hold for matrices. For scalars, we have the Law of Cancellation: 


for a 4 0,ab = ac b=c 


For matrices, it is not true in general that 


AC =AD = C=D. 


Let 
1 -2 0 3 
C= ,D= 
—2 0 —4 10 
Then 
4 4 
AC = AD, but CAD. 
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We do, however, have the following result: if C is a invertible matrix, then 


AC=BC = A=B and 
CA=CB = A=B 


1.4.1 The transpose 


Definition 1.4.4. The transpose of a matrix A is the matrix A? obtained by interchanging 


rows for columns. This corresponds to reflecting across the diagonal: 


(Aig = Aji. 
Example 1.4.5. 
14 
Led 43 
A= At 2 5 
4 5 6 
3 6 


[ts 3 Se Eesa 


B=!/]4 5 6| = BT=!|2 5 8 
7 8 9 3.6 9 


Note that it is precisely the diagonal entries which remain fixed. 
1 2 
C= = CS 


So it is possible for a matrix to be its own transpose. 


Definition 1.4.6. For a square matrix A, when A? = A, we say A is symmetric. 


(Compare to earlier defn.) 


Theorem 1.4.7. Some algebraic properties of transposition, for matrices of appropriate 


sizes: 
t. (AT)F =A 
2. (A+B)? = AT + BT 


3. (cA)? =c(A) 
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4. (AB)? = BT AT 
Proof. Proof of (d). Let AB = [c;;]. Then the (i, 7)*" entry of (AB)” is 


chy =e by defn of transpose 


a ye AjrbK: by defn of matrix mult. 
k=1 
On the other hand, B7 A” has (i, 7)*" entry 


n n 
S> bi, = S- Dijk by defn of transpose 
k=1 


k=1 
nm 
= S- AjrbKi by commutativity of scalar mult. 
k=1 


Together, these show the (i, 7)*" entry of (AB)” is equal to the (i, 7) entry of B? A? for 
any i,j, and hence (AB)? = BTA’. 


HW] §1.4: 5, 8, 20, 22, 23, 24, 27, 32, 37, 38 
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1.5 Special matrices 


We’ve seen one special kind of matrix already — an augmented matrix. This is the matrix 
associated with a system of linear equations. There are many other special kinds of 


matrices. 


Example 1.5.1. The following 4x 4 matrix gives the airline distances between the indicate 


cities (in miles). 


London Madrid New York Tokyo 


London 0 785 3469 5959 
Madrid 785 0 3593 6706 
New York 3469 3593 0 6757 
Tokyo 5959 6706 6757 0 


What special properties does this matrix have? Why? 


Example 1.5.2. Suppose you send your minions to do a poll at the supermarket and ask 
customers which type of soda pop they bought that week. 

(Let’s assume for the sake of this example that there are only four kinds of pop, and 
everyone buys exactly one type in a given week — these conditions can be removed but it 
will needlessly complicate the example for now.) 

After several weeks, your minions present you with a report indicating how likely 


someone is to buy one type, based on what they bought last time. 


Cola Root beer Orange Lemon-lime 


Cola 0.30 030 O15 0.25 
Root beer 0.40 0.10 0.30 0.20 (1.5.1) 
Orange 0.25 0.25 0.25 0.25 


Lemon-lime 0.20 0.20 0.20 0.40 


What special properties does this matrix have? Why? 
Later on, we’ll see how certain properties of this matrix (the eigenvectors and eigenvalues) 
can tell you what percentage of the population is drinking what, at any time in the distant 


future. 


Definition 1.5.3. A matrix is stochastic if all entries are nonnegative, and the sum of 
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each row is 1. 


Suppose that « represents the present state of affairs, so uw = [1 0 0 O] if every is 


drinking Cola, etc. Next week, 


[1 0 


The week after, 


[1 0 


0.30 
0.40 
0.25 
0.20 


0.30 
0.40 
0.25 
0.20 


0.30 
0.10 
0.25 
0.20 


After 6 or more weeks (so n > 6), 


0.30 
0.40 
0.25 
0.20 


0.30 
0.10 
0.25 
0.20 


0.15 
0.30 
0.25 
0.20 


0.25 
0.20 
0.25 
0.40 


0.15 
0.30 
0.25 
0.20 


0.15 
0.30 
0.25 
0.20 


0.2827 
0.2827 
0.2827 
0.2827 


0.25 
0.20 
0.25 
0.40 


0.25 
0.20 
0.25 
0.40 


0.2175 
0.2175 
0.2175 
0.2175 


a 0.30 


= 0.2975 


0.30 0.15 
0.2075 
0.2185 0.2813 
0.2185 0.2813 
0.2185 0.2813 
0.2185 0.2813 


0.25 


0.2225 


so customer distribution has stabilized at (0.2827, 0.2175, 0.2185, 0.2813), regardless of the 


initial state. 


1.5.1 Inverses of Matrices 


Definition 1.5.4. A matrix A is invertible iff it has an inverse, that is, a matrix B such 


that AB = BA=T. A matrix which is not invertible is singular. 


Theorem 1.5.5. A 2 x 2 matriz 


0.2725 
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is invertible iff ad — bc £0. In this case, 


i d —b 


Atl= 
ad — bc 


(1.5.2) 


—c a 


Note: this only works for 2 x 2 matrices. (More on this later.) 


Example 1.5.6. The inverse of A = [} —7] is 


To check, multiply 


ee w 
| | 
ee bo 
ao 
| | 
w i) 


ATA= =, 


The most important Properties of Inverses are 


Theorem 1.5.7. Assume that both A and B are invertible. Then 


Note: this shows that if A and B are invertible, then AB is also invertible. In fact, any 


product of invertible matrices is invertible. 


Example 1.5.8. Let A = [j! ?]. Then 


ara [8] = 4) = N= [8 41- 


Proof of (4). To show (A?)~! = (A7!)?, we need to show 


AT(A“L)T = (ATH AT = T. 


24 Linear systems 


First, 
APA} = (ArtaAy> (BO) = CT BT 
= (Ph AC A=T 
Next, 
(Aya _— (AATF (BO)r = CT BT 
25° AAT=I 


Theorem 1.5.9. If the inverse of a matrix exists, then it is unique. 


Proof. Suppose B and C are both inverses of A. Since B is an inverse of A, we have 


AB = BA=T,. Since C is an inverse of A, we have AC = CA = I,. Together, 


B=BIl, by additive identity property 
= B(AC) by hypothesis: C is an inverse of A 
=(BA)C by associativity of matrix mult 
=1,C by hypothesis: B is an inverse of A 
=C by additive identity property 


Powers of a matrix 


By multiplicative associativity for matrices, it makes sense to multiply the same matrix 


with itself multiple times; in other words, exponents are well defined for matrices: 


A®:=A-A...A. 
—~_—_—S’ 


k times 


If A is invertible, 


eee ae ee ee 
od 


k times 


Theorem 1.5.10. A"A™ = A"+™ = A™A4” and (A")™ = (A™)" = Am”, 
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Note however, that A* 4 [ak], and there is no general explicit formula for A* - it must 


be worked out by hand. 


In some special cases, one can see how the pattern works: 


1 1 5 1 1 1 1 1+1 141 2 2 
Tid 1 1 T. A 1+1 141 2 2 
Then 
2 
S 3 1 1 1 1 2-2 1 1 4 A 
A’ = A*A= = = 
1 1 1 1 2 2 1 1 4 
Also, 
T2 
B= ==. BPS Ky die 
0 1 1 


Putting the above theorems together, you get things like, 
Ais invertible = A* has an inverse: (A*)~! = A~* = (A71)F. 


Convention: A®° = J. 


Other functions of a matrix 


Let f(z) = 27. We just saw that for a square matrix A, f(A) is well-defined. Similarly, 
given any polynomial 


p(x) = ag +042 + agar? +--+ + anx”, 


one can define 


p(A) = aol + a,A + a2A? +--+ +a,A”. 


Example 1.5.11. Let p(x) = 2 — 3a? and A= [? 77]. Then 
p(A) = 22 3[7=1] [7-3] = (62) -3[2 2] = [26 3] 


One can even define the exponential of a square matrix, using the power series repre- 
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sentation of e”: 


Here, the first term of the sum is J, following the convention A®° = I. 


1.5.2 Diagonal matrices 


Definition 1.5.12. A diagonal matrix is a square matrix for which all non-diagonal entries 


are 0. E.g.: 
DAG. Gh mG 2 0) 
Os SO 1 
Oi Ooo. | ° 0) 2 
Or: i 1 


Diagonal matrices may look too simple to be useful, but they are actually incredibly 
useful. You will spend a lot of time wishing all matrices were diagonal, and some time in 


Chapter 7 trying to make matrices diagonal. 


A general diagonal matrix looks like 


d, O 0 
0 ds 0 
D = 
0 O dn 
and its powers look like 
ae 50 0 
k 
pee 0 d5 0 
0 O dk 


Here, k € Z={...,—2,—1,0,1,2,...}. 


Products with diagonal matrices are easy to compute: 
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mn o a 0 0 am bn co 
p qr 0 bOl= ap bq cr 
s t u 0 0 «€ as bt cu 


1.5.3 Triangular matrices 


Definition 1.5.13. An upper triangular matrix is a square matrix which has only zeros 
below the diagonal, that is, 


i>j => aij = 0. 


A lower triangular matriz is a square matrix which has only zeros above the diagonal, that 
is, 


ry aij = 0. 


Example 1.5.14. General lower, upper triangular matrices look like 


a 0 O a be 
L= b c O U= 0 de 
de f 0 0 f 


Here, a,b,c, d,e, f € R, and any (or all) of them can be 0. 


Theorem 1.5.15. The transpose of an upper triangular matriz is lower triangular, and 


vice versa. 


Proof. Clear by inspection. 


Theorem 1.5.16. The product of two lower triangular matrices is a lower triangular 


matriz. (Similarly for upper triangular.) 


Proof. Suppose A = [a;;] and B = [b;;] are both n x n lower triangular matrices. By 
definition, this means that 


i<j => a4; = by; = 0. 


Let C = AB = [c;;| be the product matrix. Then by the defn of matrix mult, 


n 
Cig = s Aikde; = Gi1b1; + aigbaj + +++ + Ginbdn;- 
k=1 
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To show that C is also lower triangular, we need to see that for i < j,cj; = 0. Then for cj 


where 7 < j, we have 


Cig = Girbrg + digbaj ++ +> + Gi(j—1)(j—1yj + Gigdyg +++ + Ginbnj 


=ajy0+ ajyo0+---+ a4(j—1)0 + 0b;; Se Obn; 


= 0. 


Theorem 1.5.17. A triangular matriz is invertible iff its diagonal entries are all nonzero. 


In this case, the inverse is also triangular (same type). 


1.5.4 Symmetric matrices 


A is a symmetric matrix iff AT = A, that is, iff aj; = aj; for all i,j. 


A is a skew-symmetric matrix iff AT = —A, that is, iff aij = —@;4 for all 2,7. 
Theorem 1.5.18. [f A and B are symmetric, then 
(i) AT is symmetric, 
(ii) A+B and A— B are symmetric, and 


(iit) kA is symmetric, fork € R. 


Proof. Homework! 


Theorem 1.5.19. Suppose A is symmetric. If A is invertible, then A~! is also symmetric. 


Proof. Assume that A is symmetric and invertible. We need to show that A~+ is symmetric. 


(AH? = (AT) Thm 


=A A= A" by hypothesis (Symmetric) 


So A! is its own transpose; it must be symmetric. 


Theorem 1.5.20. Jf A is invertible, then AA? and A’ A are also invertible. 


Proof. Assume A is invertible. Then A” is also invertible, by Thm. Then note that 


products of invertible matrices are invertible. 


Theorem 1.5.21. AA? and A’ A are always symmetric. 
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Proof. For the first one, 
(AAT)? = (AT)PAT = AAT. 


For the second one, 


(ATA)? = AT(AT)? = ATA, 


Recurrence relations 


The Fibonacci sequence 1,1, 2,3,5,8,13,21,... is generated by the recurrence relation 
Un = Un—1 + Un—2 with the initial conditions up = u; = 1. So each term of the sequence 


can be written as the sum of the previous two. 


This sum can be converted into a product using matrices: 


A closed-form solution for this recurrence is 


n_(te— 2 1 
ty = SOT eS 12 ei ODS Le 


Also, is the largest root of x? =a +1. 


HW] §1.5: 4, 8, 22, 31, 34-38, 50, 59 


Also: prove that a diagonal matrix is symmetric. 
So start by assuming that A is a diagonal matrix. Now you need to prove a;; = a;;. Try 


examining cases. 
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Also: compute e4, where t € R is a parameter and 


0 3 4 
A=]|]0 0 6 
0 0 0 


Use the power series expansion for e* and you should obtain a 3 x 3 matrix whose entries 


are functions of t. 
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1.6 Matrix transformations 


1.6.1 Functions from R” to R 


f :R” - Ris a function which eats an n-vector and spits out a number, i.e., it takes n 


inputs and gives one output. 


Example 1.6.1. f : R” > R by f(x) = ||a|]. 
fa: RX OR by fa(a) = @- a= ayn) + ag%2 +++: +4n2n, here a € R” is fixed. 


ff: R?? +R by f(a,y) =2@-y = ay t+ tayo t+ +2nYn- 


f:R° >R by f(z) = f(r, 22,73) = 21. (Coordinate map) 


1.6.2 Functions from R” to R” 


f :R” — R™ is a function which eats an n-vector and spits out an m-vector, i.e., it takes 
n inputs and gives m outputs. These are often called maps or transformations. When 


m =n, they are called operators. 


Example 1.6.2. f : R” — R” by f(x) =c. (Constant at ce R”) Esp.: f(x) =0 
f :R” > R” by f(x) =a. (Identity operator) 

fa : RR” > R” by fa(x) = az, here a € R is fixed. 

f :R” > R" by f(x) = f(1, 2,3) = (1, 0,0). (Projection) 


If we have a bunch of real-valued functions on R” 


fi(ai, v2,---,2n) = 41; 
fo(a@1, ¥2,.--,2n) = Y2, 
fim(@1, ©2,---,Ln) = Ym, 


then taken together, they define a transformation T : R” — R” via 


Ue inte ore a ee = (Yi, Y2,-- -: Ym). 


For x = (#1, %2,...,%n), this is written: 


T(x) = (fila), fo(x),--++fm(x)) 
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Definition 1.6.3. A linear function f is one for which 
1. f(et+y) = f(x) + f(y), for all vectors x,y, and 
2. f(ax) = af (x), for all vectors x and every a € R. 


Theorem 1.6.4. For T(x) = (fi(x), fo(x),...,fm()) defined as above, T is a linear 
transformation iff fi(a1,v2,...,%n) ts a linear combination of its variables, for each 


$= 1,2, 42.247: 


This means 


fi(xi, Xa, Cee Ln) = Aj1%1 + aj2v2 a aOR Aintn 


Consequently, if 7: R” — R™ is a linear transformation, then T(x) can be written as 
q y; 


Ty 
Qi1 ..- Qin 
r2 
T(x) = = Ax 
Qm1 «++: Amn 
lm 


Later, we’ll prove that linear functions are essentially the same thing as functions 


defined by matrix multiplication: 
Theorem 1.6.5. f is a linear function iff f(x) = Ax for some matrix A. 


Definition 1.6.6. Given a matrix A, write T4 for the associated function defined by 
multiplying against this matrix, i.e., 


T a(x) := Az. 


Similarly, [T] = A is the standard matrix for T. (Later: there are many matrices for T.) 


1.6.3. Some operators 


To determine what kind of transformation is effected by a given T, it may help to look at 


the image of a simple figure under the action of T. The unit cube is often helpful. 


Definition 1.6.7. The unit cube in R” is 


~~“ 


Qn := [0, 1)" = [0,1] x [0,1] x --- x [0,1] = {a ER" 0 <a; <1, i=1,2,...,n}. 
$$ 


n times 
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So Q; = [0,1] is the unit interval, Q2 = [0,1]? is the unit square, etc. 


Example 1.6.8. Let 


Now define f(x) := Az, so f ([$]) = [-%s]. 


Example 1.6.9. Let 


Now define g(x) := Az, so g([$]) = [<2]. 


Note that h(x) = —g(a) corresponds to rotation in the other direction (cw). 


Example 1.6.10 (Reflection). Consider T4 : R? — R® defined by 


—-l1 0 0 
A= 0 1 0 
0 0 1 
Then 
-1 0 0 Ty 2X1 
Ta(x) = Ar = 0 1 0 vg | = x2 
0 0 1 x3 X3 


is reflection in the x1-direction, that is, it is the symmetric image of x reflected through 


the x2x3-plane. 


Consider Tg : R? — R? defined by 


34 Linear systems 


Then 
—1 0 0 Ly —Xy 
Tp(xz) = Br= 0-1 oO to | =| —ae 
0 0 -1 23 —%X3 


is reflection in the origin, that is, it is the symmetric image of x on the other side of the 


line that passes through x and 0. 


Consider Tg : R? + R? defined by 


This matrix is obtained by rowswap, and has the effect of interchanging the first two 


coordinates: 


0 1 Ly v2 
Tole)=Ce=)1 0 0 ta | =] 2&4 
0 0 1 x3 x3 


is reflection in the vertical diagonal plane between the x; and x2 axes. 


(SKETCH lower triangular prism before & after) 


Example 1.6.11 (Projection operators). Consider T : R? + R® defined by 


1 0 0 
A=!0 0 0 
0 0 0 
Then 
1 0 0 v1 ry 
Ta(xz) =Ar=]0 0 0 tw | = 0 
0 0 0 x3 0 


is projection to the x1-axis. 
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Consider Tg : R? — R? defined by 


0 0 0 
B=]0 1 0 
0 0 1 
Then 
0 0 0 Ly 0 
Tp (x) = Ba= 0 1 0 x2 = ve 
0 0 1 r3 3 


is projection to the x2x3-plane. 


Example 1.6.12 (Rotation). Consider Tc : R? > R® defined by 


1 0 0 
C=] 0 cos@é —sin#d 


0 sin@ cos @ 


This fixes the x7,-coordinate but rotates the r273-plane. 


1 0 0 v1 ry 
To(t) =Cx=]| 0 cos@ —sind§ a | =| x2cosd—x38ind 
O sin@é cos 0 x3 x2 sin 9 + x3 cos 0 


1.6.4 Dilation and contraction 


Definition 1.6.13. T(x) = kx is called a contraction iff 0 < k < 1 and a dilation iff 
k>1. 


A contraction uniformly compresses R” toward the origin, and a dilation uniformly 
expands R away from the origin. The standard matrix for a contraction or dilation is a 


diagonal matrix with ALL entries equal to k, that is, a scalar multiple of the identity: 


k 0 0 
[FT] =|[Texr] =FIT=|] 0 k O 
0 Ok 
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Example 1.6.14 (Contraction). Consider To : R? + R® defined by 


1 0 0 
C=|0 5 Ol =5!. 
003 
Then 
4 0 0 Ly at1 ; Ly 
To (x) =Cr= 0 4 0 v2 = 5X2 = 5) v2 
0 O 5 X3 5X3 X3 


Example 1.6.15 (Dilation). Consider Tp : R®? + R® defined by 


3.0 #0 
D=|0 3 0] =8!. 
0 0 3 


Then 


Tp(2) = Dr = 0 3 :+0 x2 = 3X2 =3 x2 
0 0 3 v3 323 v3 


1.6.5 Compositions 


Definition 1.6.16. If T, : R” > R* and Tg : R* > R™, then the composition is the 


transformation TgoT,4 = Tp(T4) : RR” > R™. 


Theorem 1.6.17. The composition of two linear transforms is a linear transform. 


Proof. TgoTa(x) = Tp(Ta(x)) = Tp(Az) = B(A(ax)) = BA. 
Corollary 1.6.18. TgoT,4 =Tpa. Also, [Tg0T 4] = [Tp][Ta]. 


By repeated application, 


TooTpoT, = Tosa, ete. 
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Corollary 1.6.19. Composition is not commutative. 


Proof. If it were always true that Tg4 = TgpoT, = T40Tp = Tap, this would imply that 
AB=BA. 


Example 1.6.20. The composition of two rotations is always another rotation, but these 


don’t usually commute. (EXAMPLE: chalk brush) 


HW] §1.6: 5, 7, 8, 18, 19 
Describe in words the geometric action of f in #5. 


HW] §1.7: 3, 5, 14, 15 
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Linear systems 


Chapter 2 


Solving linear systems 


2.1 Echelon form of a matrix 


Definition 2.1.1. A matrix in row-echelon form is a matrix which has the following 


properties: 
1. The first nonzero entry in each row is a 1. 
2. The first 1 of each row appears to the right of the first 1 in the row above it. 
3. If any row consists entirely of zeroes, it appears at the bottom of the matrix. 


Definition 2.1.2. A matrix in reduced row-echelon form is a matrix in row-echelon form 
which has the additional requirement that the leading 1 of each row has only zeroes above 


and below it. 


Example 2.1.3. Each of these matrices is in row-echelon form 


1 4 2 1 2 3 1 3 0 0 0 1 2 0 
0 1 3 0 0 1 0 0 1 3 000 1 
0 0 1 0 0 0 0 0 0 0 00 0 0 


but only the last two are in reduced row-echelon form. 
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2.1.1 Back substitution 


The significance of an augmented matrix in row-echelon is that it is easy to find the solution 
of the associated system. 

If the augmented matrix is in reduced row-echelon form, one can simply “read off” the 
solution the associated system. 


We can solve a system that is in triangular form 


201 —% 2+ 3x3 —-24%4,=1 


v2 — 2x3 a 3x4 —2 


4x3 ae 3X4 =3 


4a, = 4 


using the technique of back substitution as follows: 


474 =4=> a= 1 


473 +3-1=3=>> %3 = 0 


t2—-2-04+3-1=2=>5 x2=-1 


So the solution to this system is (1,—1,0,1). 


Definition 2.1.4. The elementary row operations for matrices: 
I. Interchange two rows. 
II. Multiply an row by a nonzero constant. 


III. Add (a multiple of) one row to another. 


Definition 2.1.5. Two matrices are said to be row-equivalent iff one can be obtained 


from the other by a sequence of elementary row operations. 


Thus, two equal matrices are certainly row-equivalent, but two row-equivalent 
matrices need not be equal. 
The reason for this name is that performing a row operation does not change the 


solution set of the system. Thus, two row-equivalent systems have the same solution set. 
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2.2 Solving linear systems 


Definition 2.2.1. Gaussian elimination is the following method of solving systems of 


linear equations: 
1. Write the system as an augmented matrix. 


2. Use elementary row operations to convert this matrix into an equivalent matrix which 


is in row-echelon form. 
3. Write this new matrix as a system of linear equations. 
4. Solve this simplified equivalent system using back-substitution. 


Example 2.2.2. Consider the following system: 


Ly — 3x3 =-2 


321 = 2x3 +2 = 5 


221 + 2x9 Tr3= 4 


First, we write the system as an augmented matrix: 


1 0 -3 —2 
3 1 -2 5 
23 72.. l 4 


Second, we perform elementary row operations as follows: 


i. 10 Sg -2 
01 #7 a: (-3)R, + Re > Ro 
. 4 
dis ys GB -2 
01 #7 11 (—2)R, + Rs > R3 
0-2 7 8 
i OOS Ls8 -2 
01 7 11 (—2)R2+ R3 > R3 
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12 F0)) LV8s ||) a2 
(i oF 11 (-—2)R3 > Rg 
00 1 2 


Third, we write this last matrix as a system of equations: 


LY — & =—2 
t2+ 7x3 = 11 
3 = 2 


Finally, we use back-substitution to obtain 


tg+7-2=11 = 22 =-3 


a —- 2=2 = %=4 


Thus, Gaussian elimination yields the solution (4, —3, 2). 


Gauss-Jordan elimination 


Definition 2.2.3. Gauss-Jordan elimination is the following method of solving systems 


of linear equations: 


1. Write the system as an augmented matrix. 


2. Use elementary row operations to convert this matrix into an equivalent matrix which 


is in reduced row-echelon form. 


3. Write this new matrix as a system of linear equations. 


4. Solve this simplified equivalent system using back-substitution. 


So Gauss-Jordan elimination is just an extension of Gaussian elimination where you 
convert the matrix all the way to reduced row-echelon form before converting back to a 


system of equations. 


Example 2.2.4. Continuing from the previous example, we could convert the matrix to 
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reduced row-echelon form as follows: 


0 
7 11 (3)R3 + Ri > Ry 
1 


0 
0 —3 (—7)R3 + Ro > Ro 
1 


Now when we convert this matrix back into a linear system, we see that it immediately 
gives the solution (4, —3,2). This is the point at which you can simply “read off” the 


solution from the matrix. 


Remark. I don’t require you to write the particular row operation being used, as I have 
done above. However, I do recommend it as it is a good way of avoiding computational 


mistakes. 


Remark. Whenever you are working with an augmented matrix and you obtain a row 
which is all zeroes except for the last, then you have an inconsistent system. That is, if 
you get a row of the form 


loo ole] 
for c # 0, then the original system of linear equations has no solution. 


Definition 2.2.5. One particular important and useful kind of system is one in which 
all the constant terms are zero. Such a system is called a homogeneous system. It is a 
fact that every homogeneous system is consistent (ie, has at least one solution). One easy 
way to remember this is to notice that every homogeneous system is satisfied by the trivial 
solution, that is, ©1,22,...,£, = 0. When you set all variables to zero, the left side of 


each equation becomes 0. 


Theorem 2.2.6. A homogeneous system can only be row-equivalent to another homoge- 


neous system. 


Proof. No row operation alters any column of Os. 


Theorem 2.2.7. A homogeneous system with more variables than equations must have 


infinitely many solutions. 


44 Solving linear systems 


Proof. The reduced row-echelon form can only have fewer nonzero rows than the original 
matrix. Each nonzero row corresponds to a leading variable which will be given as a 


function of free variables, so the number of free variables is 


total — leading = free. 


So if there are less leading variables than rows (total variables), the number of free variables 


is positive. The presence of one free variable indicates infinitely many solutions. 


Example 2.2.8. We can solve the homogeneous system 


221 + 2x2 —- x3 +25 =0 


Ly tq + 223 —- 344 +275 =0 
t+ tg — 273 —25 =0 


3+ %4+2%5 =0. 


by Gauss-Jordan elimination as: 


> SA or. a OF. Or. 26. 30 
ae ee ee Ss | ae en a 2Ro + Ri + Rz 
1 1 -2 0 -1 O 0 0 0 -3 0 0 Ro+ R3 > Rs 
O 20> ds 4G Or, ay. | beth 
As oy, BAD oi aeek 5) ae 
0.0: B46. B09 
oS (-1)Ri — Ri 
O70 6 TE OB 
Gia. th. as , Sis (—3)Rs + Rs 
i! A> oy aa ime oH 
OO: SH eG 
~ (—1)R3 + Ro => Ro 
G2 102. Ore. 208 56 
Ge RG 26 (—1)Rs + Ra > Ra 
0 Oo tO 4 
ay (—3)R4 + Ro > Ro 
0.010 a OO 
00000 0 Bee 
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The corresponding system is 


+22 +25 =0 
L3 Tt = 0) 
4 = 0 
Solving for the leading variables, 
T1 = —%2 — &5 
23 = —X5 
4 = 0, 


so the solution set is 


{(—s —t,s,—#,0,£):s,¢ € R}-C R°. 


Note that no operation affects the far right column, as all these entries are 0. 


HW] §2.1: 1, 6, 10 
HW | §2.2: 6, 10, 12, 19, 20 


For #6, just do Gauss-Jordan; no need to do Gaussian. 
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2.3. Elementary matrices and finding inverses 


2.3.1 Elementary matrices 


Definition 2.3.1. An n x n matrix is called an elementary matrix if it can be obtained 


from the identity matrix I[,, by a single elementary row operation. 


Example 2.3.2. 


0 1 0 1 0 0 1 0 2 
fi=|1 0 0 fo=|0 1 0 F3=|]0 1 0 
0 0 1 0 0 3 0 0 1 


FE comes from [3 by an application of the first row operation - interchanging two rows. 
FE comes from [3 by an application of the second row operation - multiplying one row by 
the nonzero constant 3. 

E3 comes from [3 by an application of the third row operation - adding twice the third 


row to the first row. 


2.3.2 Representation of Row Operations 


Example 2.3.3. Suppose we have the matrices 


1 2 3 0 1 (0 
A=!]4 5 6 and F,=!]1 0 0 
7 8 9 001 


so that FE, is the elementary matrix obtained by swapping the first two rows of I3. Now 


we work out the matrix products as 


0 1 0 1. Qe 33 ee 
F\A=]1 0 0 45:6) =| 1-2. 3 
OO. 7 8 9 7 8 9 
it: <2, S 0 1 0 O53 
AF, =| 4 5 6 10 0/=|5 4 6 
78 9 001 8 7 9 


Conclusion: 
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multiplying by FE, on the left ~ swapping the first two rows of A 
multiplying by F, on the right < swapping the first two columns A. 


1 0 0 1 2 3 1 2 3 
FrA=]0 1 0 45 6/=] 4 5 6 
0 3 7 8 9 21 24 27 


So multiplying on the left by EF, is the same as multiplying the third row by 3. 


This is the same operation by which E2 was obtained from the identity matrix. 


to: 2 ia a 114 2416. 3+18 
EF3A=]|0 1 0 45 6/= 4 5 6 
001 7 8 9 i 8 9 


So multiplying on the left by Es is the same as adding twice the third row to the first. 


This is the same operation by which E3 was obtained from the identity matrix. 


Row operations correspond to (matrix) multiplication by elementary matrices. Every- 
thing that can be performed by row operations can similarly be performed using elementary 


matrices. 


Earlier: two matrices are row-equivalent iff there is some sequence of row operations 


which would convert one into the other. Now: 


Definition 2.3.4. Two matrices A and B are row-equivalent iff there is some sequence of 


elementary matrices £1, E2,...,E, such that 


ExyEp1... FE, B.A = B. 


Inverses and Elementary Matrices 


Theorem 2.3.5. If E is an elementary matriz, then E is invertible and its inverse E~* 


is an elementary matriz of the same type. 
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Example 2.3.6. Since 


gy 
II 
oO (en) em 


0 
1 
0 


Fe Oo WW 


comes by 2R3 + R, — R,, we choose the operation that would “undo” this, namely, 


—2)R3 +R, > R,. Then the elementary matrix corresponding to this is 
y g 


Theorem 2.3.7 (Characterization of Invertibility). The following are equivalent: 


1) A is invertible. 

2) A can be written as the product of elementary matrices. 
3) A is row equivalent to I. 

4) The RREF (reduced row-echelon form) of A is I. 


5) The system of n equations inn unknowns given by Ax = b has exactly one solution. 


6) The system of n equations inn unknowns given by Ax = 0 has only the trivial solution 


f= %Q=...=Xn=0. 


Corollary 2.3.8. Let A be square. If BA=TI, or if AB=I, then B= A7!. 


Proof. First, we show A is invertible by using (6) of the previous theorem. Suppose Ax = 0, 


so that x is a homog solution. Left-multiply the first hypothesis by B to get 
BAz = BO = Ix =0 = t= 0! 


So x = 0. Then by the prev thm, A is invertible. Thus we can right-multiply the first 
hypothesis by A~! to obtain 


BAA !=IA7! = B=A"". 


The proof for AB = I follows BSA. 
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2.3.3 How to find inverses 


From part (2) of the theorem, A is invertible iff 
Ey... Fok A=. 
Multiply on the right by A~+ to get 


Ey... E)E,AA~' =IA™, or 


Ey... EE, I = Ag. 
SO: the same sequence of elementary operations that takes A to J will take J to A7!. 
This suggests a method: 
1. Write [A|J]. 
2. Apply row operations to [A|J] to obtain [J|X]. 
3. Then X = Am. 
Example 2.3.9. Let A= [ =: 1 1] . We compute the inverse: 
1 0 1;1 0 0 


-1 11/0 1 °0 
0 1 0/0 0 1 


em 
j=) 
mK 
— 
oO 
iS 


r 


0 1 2/1 1°00 Ri + Re > Ro, 


0 0 2/1 1 -1 —R3+ Ro Ro, 


a 
~R3;>~ R 
2 3 35 


Nl 
NI 


Ee NIP 


—R3+ Rk, > Ri, 


jes) 
an 
So 
oO NIFH 


NIB 
Nir 


1 0 1}]1 0 0 
~!0 1 0}0 O 1 Ro > Rs, 
1 1 
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Thus, 
het 
2 2 2 
A= 0 20. A 
ag die! * «tel 
2 2 2 


IMPORTANT NOTE: by the theorem, if the RREF of A is not J, then A is not 
invertible. If you wind up with a matrix B ~ A that has a row of 0s during row-reduction, 


then A is not invertible! 


HW] §2.3: 4, 11, 19, 21, 23, 24, 25 
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2.4 Systems of equations & Invertibility 


2.4.1 The number of solutions of a system of linear equations 


Theorem 2.4.1. Every system of linear equations has no solutions, exactly one solution, 


or infinitely many solutions. 


Proof. It is clear by example that a system can have no solutions or one solution. Therefore, 
we just need to show that any system with more than one solution actually has infinitely 


many. 


Assume that 71 4 x2 are solutions, define xp := x1 — x2 # 0. Then 
Axo = A(x, — £2) = Ax, — Azo = b—-b=0. (x) 
Now for t € R, the system has a new solution x; + tag: 
A(ai + tr) = Avi +tAzp = 6+ 10 = 6, by («). 


Since there are infinitely many t € R and xo ¥ 0, there are infinitely many solutions 


Ye = 21 + tx. 


Theorem 2.4.2. If A is invertible, then Ax = b has a unique solution. 


Proof. We have at least one solution, since x = A~'b works: 
Az = A(A~'b) = (AA7')b = Ib =. 
To see this is the only solution, suppose Ay = b also. Then 


Aq'Ay = A715 


Iy = A71b, 


so y = A~!b, the same solution. 
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Example 2.4.3. Solve the system 


Ly +23= 1 
—4, + %2+ 23 =—3 
x2 = 12 
So 
101 b-t 4 
A=/|-111], A*=/]0 0 1 
0 1 0 ,oE 


(We computed the inverse of this matrix already.) So the solution is 


a ae a) ot), [eee |) 1 [pe 
zm |=A‘b=|0 O 1 al ea (ee y) 
rs pa -'] | 2] Le-$-2] | 
Check: 
S shes 


Theorem 2.4.4. Let A,B ben xn matrices. If AB is invertible, then so are A and B. 


We’ll prove this later, but the idea is similar to the following: if 1/a and 1/b are both 


defined, then (1/a)(1/b) = + is defined. Le., if a 40 and b 4 0, then ab 4 0. 


The method of finding inverses can also tell you what conditions b must satisfy for a 


system to be solvable; it indicates what the solution will look like in terms of b. 


Example 2.4.5. Consider the system 


10x, F 5x2 oa 1523 = by 


Zy+ t2- #3 = bg 


7% 2X2 + 3 = b3 
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We can attempt to solve this system symbolically by row-reducing the augmented matrix 


10 5 15 b Oss 28 1p 
: ar? Ri > Ry 

ape le | o by 
R2+ R33 R 
Li, Bs ia ABs Os 3 Oi Beh a a 


—2Ro+R, 73 Ri 


3R3 > Rz 


Ryo Ro 


aS as a by 
ie 0 1 0 abe a 363 
Ro © Rg 


2 —R +R, Ro 


Ro + R3 > Rg 


1 0 0 gb + $b2— igbs 
0 1 0 qb2 + 3b3 
0 0 1 4b, — $bo+ kbs 


23 > Rg 


R3+ Rh, > Ry 


So the solution, as a function of }, is the vector 


1 1 4 
peli P02 7603 
r= zb2 + 3b3 


1 1 1 
e501 — 9 00+F 7p ba 


But this is solvable because A was invertible. 


Example 2.4.6. Consider the system 


221 alr, 225 + L3 = by 


Ly Lg — 2X3 = be 


ert on nee t2+ 23 = bg 
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We attempt to solve this system symbolically by row-reducing the augmented matrix 


oO 
(=) 


2 2 1 by 3 by — 2be 


—2Ro+R, > Ry 
1 1 -1 b | ~}] 1 1 -!1 ba 
Ro+Rh3 > R 
= es es OO. 0) bgsb by ve 7G 
1 1 -1 bo 
Ry oO Ro 
~10 0 1 4-2 
gR2 > Ro 
0 0 0 b2+b3 
1 1 0 4b,+46 
a a2 Ryo Ro 
ox 0 0 1 $b — 3bp 
sRz > Ro 
0 0 0 62+ 63 


From the third row, we see that b3 = —bj. Thus, the system has a solution iff b is of the 


form 


in which case the RREF is 


and the solution is of the form 


r= t teER. 


HW] §2.4: 1, 4,5, 9,11 
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Determinants 


3.3 Determinants by cofactor expansion 


You know functions of a real variable, like 
f: ROR by f(z) =2? g:R-R by g(a) = sina. 
We've seen functions of a vector, like 


f :R? > R? by f(x) = Az, 


where 


Coming soon: 


g: R? > R by g(x) = aj + 23 so g(x) = |||). 


We can also define functions of a general matrix, like 
f(A) = sum(A) = S- Giz. 
tJ 


This example isn’t so useful. Here’s one that is. 
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Definition 3.3.1. The determinant of a 2 x 2 matrix A is 


:= ad — be. 


This defines a function f : Mz > R by f(A) = det(A). (M2 = {2 x 2 matrices}.) 
NOTE: don’t confuse with absolute value; a determinant can be negative! 
How to extend it to f: M, — R? Recursively. 


Definition 3.3.2. If A is square, the minor of entry a;; is the determinant of the submatrix 
obtained by removing the row and column in which a,;; appears, and is denoted M;;. The 


cofactor c;; is the number (—1)'*).M;;. 


Example 3.3.3. The cofactors of the identity matrix I3 are 


{i-ai» <a Se ye ae 
Sie ea + 
oa’ <p is aaa 3h 


Definition 3.3.4. The adjoint of A is the transpose of the cofactor matrix, denoted 
adj(A). 


Example 3.3.5. The adjoint matrix of 


1 2 3 
A=|]4 5 6 
7 8 9 
is 
T 
5 6 4 6 4 5 
8 9 7 9 7 8 - 
—3 6 —3 —3 6 —3 
2 3 8 1 2 
= = = 6 -12 6 = 6 -12 6 
8 9 7 9 7 8 
—3 6 —3 —3 6 —3 
2 3 1 1.2 
5 6 4 6 4 5 


NOTE: the adjoint need not be symmetric — this was a fluke for this example. 
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3.3.1 Determinants by cofactors 


Theorem 3.3.6. det A can be computed by performing a cofactor expansion along any 


row or column: 


det A = aj1¢;1 + Gjacjg + +++ + AinCin along i” row 


th 
= G1j;C1j + A25Caj +++ + Ang Cnj along j°’ column 


Can use ANY row or column — so pick one with a lot of 0’s to make life easier! 


Example 3.3.7. Compute the determinant by cofactor expansion. 


2 01 2 
-4 0 3 1 
det A = (use 2nd col) 
0 2 6 -3 
1 -1 0 4 
—4 3 1 2 1 2 2 1 2 
=-(0)} 0 6 -3}|+(0))0 6 -3]-(2)| -4 3 1]|+(-1)) -4 3 
10 4 10 4 1 04 
Now, compute each of these: 
2 1 2 
—-4 1 22 2 2 
(—2)| -4 3 1|=(-2) | (-1) + (3) 40) 
14 14 —-4 1 
104 
= (—2) ((-1)(-16 — 1) + (3)(8 — 2)) 
= (—2) (17+ 18) 
= —70 
2 1 2 
3 1 1 2 1 
(-1)} -4 3) 1}=(-1)[@) ee) + (0) 
6 —3 6 —83 3 
0 6 -3 
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= 90 


So det A = —70+ 90 = 20. 


The next theorems deal with square matrices, since only square matrices can be 


invertible or have determinants. 


Theorem 3.3.8. A is invertible iff det A # 0. 


Proof. Later. 


Lemma 3.3.9. [fi Aj, then aiicj1 + Giacjg + +++ + Gincjn = 0. 


Sketch of proof. Following the book, we work part of the 3 x 3 case explicitly. 
Consider a11c¢31 + @12€32 + 413C33, obtained by choosing 7 = 1,7 = 3. Replace the third 


row of A with a copy of the first: 


441 412 413 Q41 412 413 

= . be 
A= a21 422 423 A= Q21 422 4923 
Q31 432 433 G11 G12 413 


A’ has two identical rows, so its RREF has a row of zeros. By the invertibility Characteri- 
zation Thm, this means A’ is not invertible. Hence, by the previous theorem, det A’ = 0. 


The cofactor matrix associated with A’ is 


/ / / / / / 
Cir C12, 18 C1 12 O18 
a U U y — U / 0 
c Coir Ca2, 03: «| ~ | Cor Coq C93 | > 
/ / / 
C31 C32 C33 C31 C32 C33 


since the last row depends only on the first two rows of A’, and these are the same as the 
first two rows of A. If we compute det A’ using cofactor expansion along the third row, 
the result is 


/ 
det A’ = a11¢31 + @12€32 + 413€33- 


Therefore, a11¢€31 + @12C32 + a13C33 = 0. 


Theorem 3.3.10. [f det A #0, then 
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Proof. We will prove that A adj A = (det A)J. Then, since det A 4 0, the previous theorem 


tells us A is invertible, and we can multiply by A7!: 


(det A)I = AadjA = A~*(det A)I = A7tAadj A 


_ 1 
A = sy q adi, 


to complete the proof. So: 


a11 12 Gin 
a21 422 .-- G2n Cyt Cay wee Cit wee Ent 
. C12 C22. ies Cj2_—--- Cn2 
B:=AadjA= 
ail ai2 ain 
Cin Can «>. Cin Sie a Cnn 
Gn1 GAn2 --- ann 


Now the entries of B = [b;;] are given by 


big = i1Cj1 + Qigcj2 + +++ + GinCjn- 


If i = j, then b;; is the cofactor expansion of det A, as given in Theorem 3.3.6. 


Ifi # j, then bj; = 0 by the Lemma. Therefore, 


det A 0 .... 0 1 0 0 
O detA O 1... O 

B= AadjA= =detA) _ | = (det A)Z. 
0 0 ... detA 00... 1 


By the initial remarks, this completes the proof. 


NOTE: this is not such a good formula for computing inverses. The row-reduction 
method is probably less work. 


However, this formula will help establish useful properties of the inverse. 


Next: use the formula to prove those theorems about triangular matrices. (Recall: this 


includes diagonal matrices!) 


Theorem 3.3.11. Jf A is triangular, then det A = a11022..-Gnn- 
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Proof. Suppose A is upper triangular. Compute the determinant via a cofactor expansion 


down the first column: 


Qj, 412 «+--+ Gin 
0 a22 soe a2n 
det A = 
0 0 Ann 
a22 G2n 
= a1 : ee Fae oF Oca kt a Cay 
0 Ann 
33 «+ aA3n 
= @11 | 422 
0) Ann 
a44 Gan 


= 411422 | 433 


Q(n—-1)(n-1 Q(n-1)n 
= @11422 - ++ A(n—2)(n—2) ee ; oe 
ann 


= @11022...Ann- 


Corollary 3.3.12. Suppose A is triangular (/diagonal). Then A is invertible iff all 


diagonal entries are nonzero. 


Proof. Let A be a triangular matrix. Then 


A is invertible <= > det A #0 an earlier Thm 
> 441022... Ann #0 last result 
> ay ~0,Vi no zero-divisors in R. 


Corollary 3.3.13. The inverse of an invertible triangular matrix is also triangular (of 


the same type). 
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Proof. Suppose A is an invertible triangular matrix. Invertibility gives 


Fg 


= actA adj A. 


Scalar multiplication does not change triangularity, so this shows that we only need to 
prove adj A is triangular. 


Let’s take A to be upper triangular, for definiteness. Then 
t>jJ == ay = 0. 
The adjoint is the transpose of the cofactor matrix, so we want to show 
<p = ty = CDI My 0, orM;; = 0. 


Let B;; be the submatrix obtained from A when the i‘? row and j‘® column are deleted, 
so that Mj; = det B,;. 

Since i < j, the (i +1)'® row of A starts with at least i zeros. 
But the i*® row of B;; is just this same row with the j'* entry removed, so the i'® row of 
Bj; starts with at least i zeros. So B,; has a zero on the diagonal, in the i” row. This 


means 


0 = det Bi; = Mi; = Cig = 0. 


Here is a diagram for the final argument: 


oO Oo 


In the theory of abstract algebra, this means that the upper triangular matrices U/,, 


form a (unitary) ring (and similarly for lower triangular): 


1. U, is an abelian group under addition. (A+ B= B+A€U,, U, contains inverses 
& identity.) 


2. U, has a multiplication which is associative & distributive. 
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3. (Unitary) U,, has a multiplicative identity. 


In fact, taking into account scalar multiplication, U/,, is an algebra over R . 


HW] §3.3: 1, 3, 5, 6, 7, 12, 18 
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3.2 Properties of the Determinant Function 


What properties does f(A) = det A have, as a mapping f : M,, > R? Is it additive? 
Multiplicative? Linear? Continuous? Differentiable? Under what transformations is the 


determinant invariant? 
Theorem 3.2.1. det A = det A’. 


Proof. First, note that this is clearly true for 2 x 2 matrices, just from the basic formula. 
det A can be calculated by cofactor expansion along the first row. 


det A’ can be calculated by cofactor expansion along the first column. 


These are the same thing. 


This means that most “row” statements about determinants are still true for columns. 


3.2.1 Determinants and row operations 
This provides a much faster way of computing determinants than by cofactor expansion. 
Theorem 3.2.2. Let A ben xn. 

1. If B comes from A by multiplying a row by k, then det B = kdet A. 

2. If B comes from A by swapping two rows, then det B = — det A. 

3. If B comes from A by adding a multiple of one row to another, then det B = det A. 
Example 3.2.3. For A = [a;,], 

kau. kay. kay 
21 a92— a3, | = Kaa C11 + Kay2c12 + kay3¢13 


31 a32 a33 


= k(ay1¢11 + aiaci2 + a13¢13) = kdet A 


Similarly, this picture should convince you that swapping rows produces a sign change in 


the determinant: 
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How to exploit this to compute determinants: 


1. Use row operations to reduce the matrix to triangular form. 
2. Record the operations used as a leading coefficient. 


3. Find the determinant of the reduced matrix by taking the product along the diagonal 
(prev thm). 


4. Multiply by the coefficient to obtain the desired determinant. 


Example 3.2.4. Find the determinant of 


| 0 3 =| 


A=/1 1 2 


2 0 -1 
Solution. 
0 3 -l 
detA=2/}1 1 2 5R3 > Ry 
1 0 -$ 
0 3 -!l 
=2/0 1 3 — Rg + Ro Reo 
10 -% 
7 
GheG> 25 
=2/0 1 3 —3Rot+ Ri > Ry 
he 4 
10 -% 
=-21090 1 5 R3o Ry 
00 -¥ 
= (-2)(-4) triangular matrix thm 
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Example 3.2.5. Find the determinant of 


i 
A= | BF ob: 6 
22.31 


Solution. 


1 3 2 ir 3 2 
detA=detAT=|0 5 2/=]0 5 2/|=-—15 
2 6 1 0 0 -3 


Theorem 3.2.6. [f A has a row or column of zeros, then det A = 0. 


Proof. Do a cofactor expansion along that row or column and obtain 


det A=0-c, +0-cg+-::+0-c, =0. 


Theorem 3.2.7. detkA = k" det A. 


Proof. 
kay. kay ... kdin a1 Qj2Q2 0 «s- Gin aii 42 
kag, kag2 ... kaon kag. kag2 ... kaon a21 422 
det kA = f ; [=k ; ; Oe eae ae 
kant kane wee Kann Kani kane Hence kann Ani An2 


ann 


=k" det A. 


The determinant is not additive; ie., it is NOT generally true that det(A + B) = 


det A + det B. 


But it 1S multiplicative: det(AB) = det A- det B. 


For the proof, need some lemmas. 


Lemma 3.2.8. If E is elementary, then det(EB) = det E - det B. 


Proof. case (1) E comes by multiplying a row by a constant k. 


Then FB is B with one row multiplied by k, so 


det(EB) = k det B. 
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But we already saw det E = k, so 


det(EB) = det E - det B. 


case (2) E comes by row swap. Similar. 


case (3) E comes by row addition. Similar. 


Theorem 3.2.9. A is invertible iff det A 0. 


Proof. Let R= Ey... EA be the RREF of A. Applying the lemma k times, 


det R = det Ey... det Ea - det FE, - det A. 


Since det E # 0 for any elementary matrix, 


detR=0 <=> detA=0O. 


(=) Suppose A is invertible. Then R = J and det R= 1 40 shows det A 4 0. 
(<=) Suppose det A 4 0. Then det R 4 0, so R cannot have a row of zeros. Then by 


prev Thm, R = I. This implies A is invertible, by Characterization Thm. 
Definition 3.2.10. A vector x is proportional to a vector y iff = ay for some a ¥ 0. 


Corollary 3.2.11. If A has two proportional rows (or two proportional columns) then 


det A = 0. 


Proof. Suppose A has two proportional rows. Then if B is the RREF of A, B has a row 
of zeros, so B is not invertible and det B = 0. But B is invertible iff A is invertible, so 
det A = 0 also. (We are using the Invertibility Characterization Thm.) 


If A has two proportional columns, then A? has two proportional rows. Since det A = 


det A’, this reduces to the previous case and we are done. 


Theorem 3.2.12. det is multiplicative: det AB = det A- det B. 


Proof. case (1) A is invertible. Then A = FE, £2... Ex, so 


AB =E,Eo...E,B 
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det AB = det (EE. Pare EB) 
= det FE) det Ey... det Ex, det B 
= det (EE, aged Ex) -det B 


=detA-det B 


case (2) A is not invertible. Then AB is not invertible, by Thm. Hence 


0 = det AB = Odet B = det Adet B. 


Example 3.2.13. Let 


Then 


detA=2-0=2, detB=8-—(-1)=9, detAB=-8- (-26)=18. 


Theorem 3.2.14. If A is invertible, then det A~' = (det A)~! = 344. 


Proof. Since A~'A = I, 


det(A~1 A) = det I 


det At det A=1 prev Thm 


det A~t => — det A £0. 


HW] §3.2: 1, 6(a), 8, 9, 13, 15, 17, 18, 34 


68 Determinants 


3.5 Applications of Determinants 


3.5.1 Cramer’s Rule 


Theorem 3.5.1 (Cramer’s Rule). Suppose Av = b, where det A#0. Then the solution 


to the system is given by 


where A; is obtained from A by replacing the i*” column of A with b. 


Example 3.5.2. Solve the system with Cramer’s Rule. 


221 + ®2+ 2x3 =1 


—4271 —-%oar %132= 0 


— 4% —2%3=1 


Solution. First, gather the matrices 


Or ily <O ee hs sd 
A=|-4 -1 1 A,=/]0 -1 1 
Oi 0) ot ‘en ee 
a. A 2 | ie i, | 
A,=|-4 0 1 Az3=| —-4 -1 0 
Ot 0 =f 1 


Then compute the determinants (introduce DIAGONAL TECHNIQUE for 3 x 3): 


det A =6 det A; =6 det Ao = —18 det Az = 6 


Now the solution is 


det Ay 6 1 
det A 6 
— det Ag = —18 os 
€ det A 6 3 
det A3 6 1 
det A 6 
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Advantages of Cramer’s Rule: 

1. Fast for small systems. 

2. Can compute x; without computing the entire solution. 

Disadvantages: 

1. Too long/slow for large systems. Row-reduction is more efficient for systems larger 


than 3 x 3. 


3.5.2 Linear systems of the form Az = Ax 


Note: such a system is equivalent to 


AIz = Ax 
AIz — Ax =0 
(AI — A)x =0 
which is homogeneous for 
A 0 0 Qi. 12° « 443 A411 G12 —443 
AI-—A=]0 X 0] -|] aa a2 arg | = —a21 A-—a22 —a93 
0 0 A 431 432 433 —a31 —a32  A— 433 


Definition 3.5.3. The characteristic polynomial of A is 


det(AI — A), 


and the characteristic equation of A is 


det(AZ — A) = 0. 


The system (AI — A)x = 0 will have nontrivial solutions iff det(AJ — A) = 0. 
(A nonzero determinant means invertibility of (AJ — A), which implies only the trivial 


solution exists.) 


Definition 3.5.4. A solution \ of the characteristic equation is called an eigenvalue. The 


corresponding nontrivial solutions of (AI — A)x = 0 are called eigenvectors. 
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More later. 


Example 3.5.5. For A = [29] from a previous example, the characteristic equation is 


A=2 0 : 
det(AI — A) = = (A—2)(A—1)-0 =A? -3A4+2=0. 
er ae 


Thus the eigenvalues of A are \ = 1,2. 
For A = 1, solving (AI — A)x = 0 yields 


—-1 0 Ly 0 —1 0/0 1 0/0 
=> 
—3 0 r2 0 —3 0/0 0 0|0 


yields x; = 0 and v2 =anything. So an eigenvector corresponding to A = 1 is 


0 
6 i) 
1 
Meanwhile, for A = 2, 
0 0 Ly 0 0 0|0 1 —3|0 
= => ~N ‘i 
-3 1 XQ 0 —3 1/0 0 0|0 


which means that x7, — 322 = 0, or 3x1 = Xq. So an eigenvector corresponding to \ = 2 is 


HW] §3.2: 10, 11, 12, 26 
HW] §3.5: 1-4 


Chapter 4 
Vector Spaces 


4.1 Vectors in R? and R? 


Add two vectors geometrically by putting the tail of x on the head of y (or v.v.). 


2x is the vector in the direction of x but twice as long. 


NOTE: the scalar multiple of a vector still points in the same direction. 
—ax = (—1)z is the vector in the opposite direction of x and of the same length. 


x — y goes from the head of y to the head of x. Think 


(c—y)+y=2. 


Written in cartesian coordinates, 


Ly 
t= (21,2, £3) = v2 


v3 


So if in doubt, go back to x = (a1, x%2,x%3) and check componentwise. 


x+y = (x1 + yi, £2 + yo, 23 + ys) 
ka = (ka, ka, kx), keR 


Just like matrices, two vectors are equal iff all their corresponding entries are equal. 
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The vector from a point P = (x,y) to a point Q = (a’,y’) is 


PQ=Q-P= 
y—yY 


4.1.1 Vector arithmetic 

Recall from matrices: 

Theorem 4.1.1. Jf u,v,w are vectors of the same length, and k,@ € R, then 
(i) u+v=v+t+u4, 
(ii) u+(utw) =(ut+v)+uw=utvty, 


(iii) u+0=0+u=u4, 


(iv) w+ (—u) =0, 
(v) k(Cu) = (ke)u, 
(vi) (k+0)u = ku + lu, 
(vii) k(u+v) =ku+kv, and 


(viii) lu =u. 
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4.2 Vector Spaces 

A vector space is a set of things that you can take linear combinations of. 
Definition 4.2.1. A vector space is a set V such that: 

(a) There is an operation V @ V — V satisfying 


(a) u®v=vu, for any u,v EV. 


(b) u®(v@w) = (v Gu) Ow, for any u,v, w € V. 
(c) There is an element 0 such that u@0 = u, for any u€ V. 


(d) For every u € V, there exists an element v € V such that u@v = 0. Denote this 


by v = —u. 
(b) There is a field F acting on V via FO V > V, such that: 


(a) a© (u@v) =au@ cv, for any u,v € V andae F. 
(b) (a+b) Qu=(a©u) 6 (b© wu), for any uc V anda,be F. 
(c) (ab) Ou=a(b©u), for any u€ V anda,be F. 


(d) 10 u=u, for every uc V. 


Elements of V are called vectors. Elements of F are called scalars. 


The first four properties make V into an (abelian) group. 


Usually, F = R,Q,C. When F = R, V is called a real vector space. When F = C, V is 


called a complex vector space. 

Theorem 4.2.2. From the properties above, one can prove that any vector space satisfies: 
(a) DOu=0, for everyueV. 

(b) a©0=0, for everyae F. 


(c) ifaQu=0), then eithera=0 oru=0. 


(d) (-1)Ou=-u, for everyueV. 


Example 4.2.3. V = (R”,+,-) is the vector space we’ve studied until now. 


A vector here is just u = (u1,..., Un). 
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Example 4.2.4. V = (Minn, +,-) is the vector space of m x n matrices. 


A vector here is 


Um1 ua Umn 


Example 4.2.5. V = (U,,+,-) is the vector space of n x n upper triangular matrices. 


A vector here is 


Unn 


Example 4.2.6. V = (D,,,+,-) is the vector space of n x n diagonal matrices. 


A vector here is 


Unn 


Example 4.2.7. V = (S,,+,-) is the vector space of n x n symmetric matrices. 


A vector here is 


Ulin *"° Unn 


Example 4.2.8. V = (Tr®,+,-) is the vector space of n x n matrices with trace 0. 


A vector here is 


u= me Myo where Tr(u) = So wis = 0 
i=l 


Uni -*"" Unn 


Example 4.2.9. V = (C(X),+,-) is the vector space of continuous functions on X. 


Here, X might be R” or some subset of it. A vector here is 
u:X SR. 


Example 4.2.10. V = (R®,+,-) is the vector space of R-valued functions on R. 


A vector here is just any function u: R—- R. 
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Example 4.2.11. V = (C*(X),+,-) is the vector space of functions on X whose deriva- 
tives u*) (x), u(x), ...u!" (ax), u’(x), u(a) are continuous. 


Here, X might be R or (a,b). A vector here is 
u:X >R. 


This is extendable to multivariable functions: if X = R°, then C?(X) would consist of all 


. . 2 . . . . 
functions u: R” + R for which ~2~ is continuous for each i, j = 1, 2,3. 
Ox, Ox; oJ 145 


Example 4.2.12. V = (C%(X),+,-) is the vector space of functions on X whose deriva- 


tive u) (x) are continuous, for any k = 0,1,2,.... 


Example 4.2.13. V = (P,,+,-) is the vector space of polynomials of degree k < n. 
Here, X might be R or (a,b). A vector here is 


u(x) = a9 +a, + agz? +--+ anx”. 


Example 4.2.14. V = (Ezp,®,-) is the vector space of exponentials e?*, if we define 


aeP® @ bel := abet O®, 


HW] §4.2: 1, 2, 3, 13, 24, 25 


Prove that every real vector space (other than the trivial vector space {0}) contains 


infinitely many vectors. 
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4.3. Subspaces 
Definition 4.3.1. Let W be a nonempty subset of V which is a vector space under the 
operations of V. Then W is a subspace of V (Note: both V and W have the same field F.) 


Theorem 4.3.2. If W C V and W is closed under the operations of V, then W is a 


subspace of V. That is, W is a subspace whenever 
u,v © W = (i) u®v € W, and (it) au € W for any ae F. 


In particular, if W C V and W is closed, then W is a vector space. 


Example 4.3.3. Every vector space V has two (trivial) subspaces: V itself and the zero 


subspace {0}. 


Example 4.3.4. RC R? C R? CR”, for n > 3. 
Let m <n and X =R. Then Pm C Py € O%(X) C OF(X) C O(X). 


Example 4.3.5. Let V = R® and W = {(z,y,z) : x — 2y + 6z = 0}. This is a plane 
through the origin. 


Example 4.3.6. Let V = R° and W = {(a,b,c):c=a+b} = {(a,b,a+b)}. 


Example 4.3.7. Let V = R® and U = {aui+bu2} = {all linear combinations of wu; and uz}, 


where 
1 0) 
u=!}0 and ua=]1 
1 1 


1 0 a 0 a 
auyjtbug=a}]o}+b]1}/=] 0;/+] bo] = b 
1 1 a b a+b 


Theorem 4.3.8. Let V be a vector space, and uj, U2,..-,Un € V. Then 
k 
We {Dai © ui) -a; € F} = {(a1 © m1) +--+ G (ax © ug) } 


i=l 


is a subspace of V. 
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Definition 4.3.9. For an m x n matrix A, the null space of A is 


null(A) := {a : Ax = O}. 


Theorem 4.3.10. null A is a subspace of R”. 


Proof. Let x,y € ker(A). Then A(a + y) = Ax + Ay =04+0=0, so x+y € ker(A). 
If cE R, then A(cx) = cAx = c-0 =0, so cx € ker(A). 


Definition 4.3.11. The kernel of a linear transform T : R” > R™ is 
ker(T) := {a2 : T(x) = 0} = T~'(0). 
The range of a linear transformation T’ is 
ranT = {y€ R” :y=T(a), for some x € R"}. 


Theorem 4.3.12. kerT is a subspace of R” and ranT is a subspace of R™. 


Proof. HW. 


HW] §4.3: 19, 20, 23, 24, 25, 32, 40 


Prove that kerT is a subspace of R” and ranT is a subspace of R™, for any linear 


transformation T : R” > R™. 

Prove that for T4(x) = Ax, ker T4 = null A. 

Prove or disprove: V = ({u: RR: limjz)_,.. u(x) = 0}, +, +) is a vector space. 

A function f : R > R is even iff f(—x) = f(x); it is odd iff f(—x) = —f(ax). Prove or 
disprove: (i) the even functions on R form a vector space, (ii) the even functions on R 
form a vector space. 

Hint: for both of these, recall that the collection of all R-valued functions on R is a vector 


space and use the theorem. 
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4.4 Span 
Definition 4.4.1. If S = {v1,v2,..., Un} is any set of vectors from a vector space V, then 
the span of S is 
k k 
span S = {} > aju; : a; € F} (- {Dai O uj): a; € ri) 
i=1 i=1 


Conversely, if V = span S for some S CV, then S is called a spanning set for V. 


Theorem 4.4.2. For any S CV, the span of S is a subspace of V. 


Proof. Let u = >> a;v; and w = >> bjv;. Then 


utw= ys ajvjy + S- bv; = So (ai + b;)v; € span S, 
and cu= cye a;v; = S- ca;v; € span S. 


Example 4.4.3. Let V = Pg be the polynomials of degree at most 8. Then span{a, x3, x°, 2”} 


is the subspace of Ps consisting of odd functions. 


Example 4.4.4. Let V = Mo be the 2 x 2 matrices, and let 


Then span{A, B} is the subspace of M22 consisting of diagonal matrices. 


Example 4.4.5. For V = R® and 


3 1 
w1=1!]01, and v2 = 1 , 
1 —2 
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4.4 Span 
3 1 |2 3 0/1 1 O} 5 
O Ltisl}~ryo.iytt|r~y oy 1 
1 -2)|3 1 0/5 0 O| -14 


This is inconsistent, so the answer is no: v ¢ span{v1, v2}. 
Example 4.4.6. We can solve the homogeneous system 
241 + 2x2 —- x3 +25 =0 


—%1 — %2+22%3 —-32%4+275 =0 


t+ rq — 273 —25 =0 


t3+ t%4 +25 = 0. 


by Gauss-Jordan elimination as: 


| 2 2 -1 0 1 0 | | 0 0 38 -6 38 
-1 -1 2 -3 1 0 -1 -1 2 -3 1 
1 1 -2 0 -1 O ~ 0 0 0 -3 0 
0 0 1 1 1 0 | 0 O 1 1 1 
11-2 3 -!1 

0 0 3 -6 3 

. 0 0 O 1 0 

0 0 1 1 

[1 1 -2 0 -1 

00 3 0 8 

~ 00 O01 0 

0 0 1 0 1 

1 100 1 0 

0010 1 0 

7 000 1 0 0 

000 0 0 0 


The corresponding system is 


“1+ 22 +25 =0 


2Ro + Ry => Rz 


Ro+ R3 > Rg 


Ry oO Ry 
(-1)Ri —> Ry 


(—3)R3 > Rs 


(—1)R3 +R, 7 Ry 


(—1)R3 +Ry—- Ro 


(-1)R3 + Ry Ry 


2R4+R, > Ry 
(—3)R4 +Ry—- Ro 


R3 Oo Rg 


80 Vector Spaces 


X3 +25 = 0 
L4 =0 
Solving for the leading variables, 
Ly, = —LQ— @5 
L3 = —2X5 
LA = 0, 


so the solution set is 


{(—s —t,s,—#,0,t) s,¢ © R}-C R®. 


Thus, any solution can be written 


—s—t —1 —1 
8 1 0 
af =s 0 +t) —-1 
0 0 0 
t 0 1 


oO Oo Oo Fe 
Oo 


Theorem 4.4.7. Suppose S is a spanning set for V. Let R be some other subset of V. If 
each element of S can be written as a linear combination of elements of R, then R is a 


spanning set for S. 


Proof. HW 


HW 84.4: 7, 8, 10, 11 Prove the last theorem. 
This theorem may be helpful for #7, 8, 10. 
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4.5 Linear independence 


Example 4.5.1. Let V = R® and U = {aui+bu2} = {all linear combinations of wu; and uz}, 


where 
1 0 
u= | 0 and u=|i1 
1 1 
Then 
1 0 a 0 a 
auyjtbug=a}]o}+b]1}/=] 0;/+] bo] = b 
1 1 a b a+b 


So S; = {u1, ug} is a spanning set for U. But so are: 


1 3 0 1 2 0 1 
Ss={}o]}],}/ 2],}/ 1 ]} and S3={} 0 },] o},} 1 4],]) -1 J} 
1 5 1 1 2 1 0 


But S$; is the most efficient in some sense: Sg and S3 both contain redundant information. 


(If you were trying to describe the plane S$ , you only need two vectors.) 


Important principle (to be formalized as a theorem shortly): 


S;, C So and span S$; = span Sp = U = every vector in S2 is a lin. comb. of vectors of $4; 


However, containment is not quite the right idea for the theorem: consider 


1 1 1 
S4g={ -1 ’ 1 ’ 2 } 
0 2 3 


One can see that no element of S; is contained in $4, or vice versa. Nonetheless: each 


element of S4 is a linear combination of elements of S; and vice versa: 


—1 | =U — U2, 1 | =U + U2, 2 | =u, + 2ug, 
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and 


1 1 1 1 

1 1 
m= 5 -l]}+5] 1], wa=}]2)-)] 1 
0 2 3 2 


This shows that span $4 = U. 
Note also that $4 is redundant in the sense that any of its elements can be “built” out 


of the other two: 


1 1 1 1 1 1 1 

1 1 
-1}=3]/ 1]-2)] 2], 1/=5]-1)735)]2 1], 2 =—5 
0 2 3 2 0 3 3 


When describing a plane, you really only need two vectors! 


In order to avoid the “redundancy” in S, where some element can be written as a 
linear combination of the others, we introduce the condition of independence. The idea is 


that u is not independent of {v1, v2,...,v,} if you can write 


U= A4V1 +:+++ + apUy 


for some coefficients aj,..., ax. 
Definition 4.5.2. The set of vectors {v1, v2,...,vx%} is linearly dependent iff there exist 
constants a1,...,a@2 (not all 0) such that 
k 
se a,Ui = 0. 
i=1 
The set of vectors {v1,v2,..., Uz} is linearly independent iff 
k 
S/ ajv; = 0 => a, = ag =--- =a, = 0. 
i=1 


In other words, the only way Sa a;v; can equal 0 is if all the a; equal 0. 
Linear independence is a condition that applies to sets of vectors. 
Special case: no set containing the zero vector is linearly independent. 


Special case: a set of 2 vectors is linearly independent iff one vector is not a scalar 
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multiple of the other. 


Recall from §2.2: 


Theorem 4.5.3. A homogeneous system of m equations inn unknowns always has a 


nontrivial solution when m <n. 


Consider the homogeneous system 


k 
) AYU; = 0 
i=1 


which can also be written 
ay 
a1 | Uy + a2] vo +-+++@n | vp, = Vy Ug «2. Un ' = Ba=0 
nm 


So the question of linear dependence vs. independence amounts to: does the system Ba = 0 


have nontrivial solutions a or only the trivial solution a = 0? 


Example 4.5.4. Are the vectors linearly independent? 


1 1 —3 2 
Vz= 2 » Vya=]—-2 ], V3 2 » Va=]|0 
—1 1 —1 0 


Need to look for nontrivial solutions of a a;v; = 0, i.e., of the homogeneous system 


a, + ag — 3a3 + 2a4 = 0 
2a, — 2a2 + 2a3 =0 


—a, + a2 — a3 =0 


By the theorem, there are nontrivial solutions, i.e., ea ajv; = 0 does not imply that all 


the a; = 0. So this set is linearly dependent. 
The theorem above may be rephrased: 


Corollary 4.5.5. In a vector space of dimension n, any set of n+1 or more vectors will 


be linearly dependent. 
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(We haven’t seen the formal definition of “dimension” yet, but for now all you need is 


that the dimension of R” is n.) 
Sometimes it takes more work to determine independence: 


Example 4.5.6. Are the vectors v, = t? +¢+2, vo = 2t? +t, and v3 = 3t? + 2t+2 
linearly independent in P2? 


Need to look for nontrivial solutions of ae a;v; = 0, i.e., of the homogeneous system 


1 2 3/0 Li «Al 2 1 0 1 ] 
1 1 2/0 }~r]0 1 1 ~]0 1 1 
2 0 2)0 0 -2 —-2 0 0 0 


Thus the coefficient matrix of the homogeneous system is noninvertible, and so there are 
infinitely many solutions. In other words, Sy a;v; = 0 does not imply that all the a; = 0: 


the set {v1, v2, v3} is linearly dependent. 


Theorem 4.5.7. Let S = {v1,v2,...,Un} CR”, and let A be a matrix whose rows are 


elements of S (in any order). Then S is linearly independent <=> det(A) 4 0. 


Proof. (=) Let B = A’, so the columns of B are elements of S. If S is linearly independent, 
then Ba = 0 implies a = 0, so this homogeneous system has only the unique solution and 
so B is invertible. But then det(A) = det(A?) = det(B) 4 0. 

(<=) We show the contrapositive: if S is linearly dependent, then det(A) = 0. Suppose 
S is linearly dependent. Then one element of S can be written as a linear combination of 
the others, so one row of A can be written as a linear combination of the others. Then 


using row operations, one can eventually replace that row by a row of 0s. Therefore A is 


not invertible, and so det(A) = 0. 


Theorem 4.5.8 (Characterization of Invertibility). The following are equivalent: 
1) A is invertible (i.e., A~+ exists). 

2) A can be written as the product of elementary matrices. 

3) The RREF of A is I (so A is row equivalent to I). 


4) The system Ax =b has exactly one solution. 


5) The system Ax = 0 has only the trivial solution x = 0. 
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(6) det A #0. 
(7) The rows/columns of A are linearly independent vectors. 


Theorem 4.5.9 (Independence and containment). Let $1, S2 be finite subsets of a vector 
space V, with Sy C So. 
(i) If Sy is linearly dependent, then so is Sy. (tt) If Sg is independent, then so is Sj. 


Proof. (i) Let $; = {v1,v2,...,v;} and Sy = {v1, v2,...,U;,Uj41,---, Ue}. If Sy, is depen- 


dent, there is some linear combination 


ayv, +++» + ajv; =0 


where not all a; = 0. Hence 


ayVy +++: + ajv; + Ovj41 +--+ + O0vg = 0, 


with not all a; = 0, which shows that S2 is dependent. (ii) is the contrapositive of (i). 


HW] 84.5: 7, 9, 11, 13, 14, 17, 19, 23, 24, 25, 26 


Prove that if v 4 0, then {v} is linearly independent. 


Prove that if {v;,v2} is a linearly dependent set, then one is a scalar multiple of the 
other, i.e., vy = ave or vg = av, for some a € R. 
Prove that {0} is linearly dependent. Using this, prove that any set containing 0 is 


linearly dependent. 
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4.6 Basis and dimension 


4.6.1 Basis 


Definition 4.6.1. A basis for a vector space V is a set of vectors S = {v1,v2,...,Un} 


such that 
1. S spans V (i.e., S is a spanning set for V), and 
2. S is linearly independent. 
This means: for any u € V, 
(i) w= >> a,v;, for some choice of coefficients a; € R, and 
(ii) S' is minimal in the sense that it contains no redundant information. 


More precisely, the efficiency of (ii) ensures that there is ONLY ONE way to write u as a 
linear combination of the v;. 

A primary use of basis is this: it suffices to define any linear transform by specifying 
its action on the basis. Suppose S = {uj,u2,...,Un} is a basis for U and T: U —> V is 


linear. If you know T(u;) for each i, then you know everything about T because: 
(i) given any u € U, there is a unique way to write u = ) > a;u;, and 
(ii) T(u) = T (95 ayui) = 35 aT (uz), because T is linear. 


Theorem 4.6.2. If S = {v1,v2,...,Un} is a basis for V, then any vector u € V can be 


written in a unique way as a linear combination of elements of S. 


Proof. If S is a basis, then it is a spanning set, so there is some solution of the system 


u = >> a;v; by definition of “spanning set”. To see the uniqueness, suppose 


n n 
U= ; AYVi = ) Vj. 
i=l i=l 


Then we need to show a; = b; for each 1 = 1,2,...,n. 


n 


0O=u-u= sou = Sh = So (ai i bj) Uj. 
i=l i=1 


i=1 


S is a basis, so the v; are linearly independent, whence a; — b; = 0 for each 2. 
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Example 4.6.3. Recall our previous example 


1 0) 1 1 1 
Sy=fu=]0],u=] 1 S4= 4p =}] -1 |,v2=] 1 ]|,u3=] 2 
1 1 0 2 3 


We saw that S, was not linearly independent. Here is an example of how this leads to 


nonuniqueness of representation: 


1 iL 1 1 1 1 
eee il eee Nae | Sh) ool |g 2|=2 
91 92 2 5) = = Ut = = = 402 U3 
0 2 1 1 2 3 


Sy 


To check that S is independent, 


is, ah ot im Fs fhe - A 
aq ot De | Or Be Bs 0 ee ae 
i? oe Hi ie OG 0. =3 


so S is linearly independent by the Characterization of Invertibility theorem. 


To check that S$ spans R*, we need to be able to find a solution of 


1 1 1 a 1 11 Ly a 
tm) -1 | +272} 1)]4+%3}] 2/= 4] 6 o -1 1 2 tm |=] od), 
0 2 1 c 0 2 1 x3 Cc 


for any choice of a,b,c. However, the coefficient matrix has nonzero determinant and 


therefore is invertible, so there is always a unique solution, by Char of Inv theorem. 


Note: Since R? is 3-dimensional, S must have exactly 3 nonzero vectors, if S' is to be a 
basis. If S has 2 elements, it cannot span R?. If S has 4 or more elements, then it cannot 


be independent, by a theorem. From the above example, one can see that this is equivalent 
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to the following result: 


Theorem 4.6.5. Let S = {v1,v2,...,Un}, and let A be the matrix whose columns (or 


rows) are elements of S. Then S is a basis for R” if and only if det(A) 4 0. 


Proof. The Characterization of Invertibility theorem contains: S is linearly independent 


<=> det(A) #0, and (ii) Ax = 6 has a unique solution <=> det(A) 4 0. 


Example 4.6.6. Is S = {1+ ?#?,-1+t,2+4 2t} a basis for P2? 
To see that S$ spans P2, we need to see that a generic quadratic polynomial a + bt + ct? 


can be written as a linear combination of these three: 


at bt + ct? =a,(1+t’) +a2(—1 +t) + .a3(2 + 2t) = (ay — ag + 203) + (ag + 2a3)t + ayt?. 


Two polynomials agree for all ¢ if and only if the coefficients of the respective powers agree, 


so this gives three equations to solve: 


1: a, —d2+ 2a3 =a a1 =, 
t: ag + 2a3 = b = az = $(-a+b+o), 
a ay =e a3 = ¢(ta+b—c). 


Suppose we are given the vector 5 — 2t — t?. Then this formula gives 


(a1, a2, a3) = (—1, -4, 1) = 5 — 2t —¢? = (-1)(1 +27) — 4(-14+ 8) + (1)(2 +. 20) 


To check that S is linearly independent, consider the homogeneous system 


ay =0 a, = 0, 
a2 + 2a3 = 0 = a2 = 0, 
a, — a2 + 2a3 = 0 a3 = 0. 


So the formula implies that this homogeneous system has only the trivial solution, and 


hence S' is independent. 


Example 4.6.7. Is S = {1+ ?#?,-1+t,2+4 2t} a basis for P2? 
Alternative approach: note that Pz has the same thing linear structure as R® (to be made 


recise in §4.8). Any vectors a + bt + ct? € Pz can be written as (a,b,c) € R? and vice 
p y 
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versa, and under this encoding, the vector space operations are compatible: 


(a + bt + ct?) + (p + gt + rt?) —— (a,b,c) + (p, 4,7) 


| 


(a+p)+(b+qtt+(c+r)? (a+p,b+4,c+r) 


So S is a basis of Po if and only if Q = {(1,0,1), (—1, 1,0), (2, 2,0)} is a basis of R*, and 


1 041 Ie afi et At, Or) 
Sf A. 800) )-Si ah SS Sa! SE AG Sea ee 
Dm >.) 4 0 0 i ee 


Theorem 4.6.8. Let S = {v1,v2,...,Un} be any finite subset of a vector space V, and let 
W =spanS. Then some subset of S is a basis for W. 


Proof. If S is linearly independent, then S is a basis for W. 
If not, then some v; can be written as a linear combination of the others, so delete it 


to obtain S2 = {v1, v2,...,Uj—-1,Vj41,---,Un}. Note that span Sz = W. If S$ is linearly 


independent, we’re done; if not, then repeat. 


Use this idea to solve #11, for example. 


HW] §4.6: 2, 6, 7, 8, 11, 15, 29, 36 


Also: Let 6; = (4,1) and 82 = (—2,1) 
(i) Show that {@1, G2} is a basis for R? and solve (2,5) = a6 + bG2. 
(ii) Find a matrix A such that T4([$]) = 61 and Ta([9]) = Bo. 
(iii) Find a matrix B such that Tg(31) = [§] and Tg(2) = [9]. 
(iv) Use Tg to represent the vector (2,5) in terms of the basis {(;, Bo}. 


(v) Explain why B is called a “change-of-basis matrix” (or in this book, a “transition 


matrix” ). 
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4.6.2 Dimension 


Theorem 4.6.9. If S = {v1,v2,...,Un} is a basis for a vector space V and Q = 


{w 1, W2,...,W,} is a linearly independent set of vectors in V, then r <n. 
Corollary 4.6.10. Any two bases B, and Bz of V have the same number of elements. 


Proof. (When the B; have finite cardinality): apply the theorem with S = B, and T = By, 


and then the other way. 


This means that the following definition makes sense: 


Definition 4.6.11. The dimension of a nonzero vector space V is the number dim V of 


vectors in a basis for V. If V = {0}, we define dimV = 0. 


Example 4.6.12. dim R” =n, for any n = 1,2,.... 
{1,t,t?} is a basis for Po, so dim P2 = 3. 
{[48],1931,(9 81,18 2]} is a basis for M2, so dim Mg2 = 4. 
{[4 3], [8 9]} is a basis for Dz, the diagonal subspace of M22, so dim D2 = 2. 
Definition 4.6.13. T is a maximally independent subset of V iff T is independent and 
not contained in any strictly larger independent subset of V. 

T is a minimal spanning set for V iff spanT = V and no strictly smaller subset of T 


spans V. 

Theorem 4.6.14. TFAE (the following are equivalent): 
1. B is a basis for V. 
2. Bis a maximally independent subset of V. 


8 Bisa minimal spanning set for V 


HW] §4.6: 11, 13, 16, 18, 41, 42, 43, 48, 49 


For #16, also give the dimension of this space. 
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4.7 Homogeneous systems 
Recall that null(A) := {x : Ax = O}. 
Definition 4.7.1. The nullity of a matrix A is nullity(A) := dim null(A). 


Example 4.7.2. Find a basis for null(A) and compute nullity(A), where 


11441 2 

= O° > 22 ed: a 

000 1 2 

21601 

Solution: starting with R3 + Ry — R4, we have 

11441 2 102 01 102 0 1 
012 11 0 1 1 1 012 0 -1 
000 1 2 r 0 0 1 2 ~ 0001 2 
2161 8 2 4 0 2 000 0 =O 


The leading variables are 71,22, and x4, so make x3 = s and x; = ¢ and solve in terms of 


these to get 


—2s—t —2 -1 

—2s4+t —2 1 

Ax = 0 => r= s =s 1 +t 0 
—2t 0 —2 

t 0 1 


So {b; = (—2, —2,1,0,0), b2 = (—1,1,0, —2,1)} is a basis of null(A) and nullity(A) = 2. 


Example 4.7.3. For A = | »,], find all real numbers \ such that the homogeneous 
system (AJ — A)a = 0 has a nontrivial solution. 
Solution: the system 
A-1 —5 Ly 


(AI — A)x = =0 
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has nontrivial solutions if and only if 


0 = det(\I — A)x = a, = (A—1)(A4+1) —15 = 2 —16 = (A—4)(A +4) 


so A= +4. 


Example 4.7.4. For A = [3 °,] and each of \ = +4, find a basis for null(AJ — A). 


Solution: solve the homogeneous system (AI — A)x = 0 for each value of A. 


For A = 4, 
3-5 3. -5 

AI -Az= ~ = Let , foranyteR. 
-3 «5 0 0 3 

For A = —4, 

-5 —5 1 1 1 

AL-A= ~ == i= , foranyteR. 
—-3 -3 0 0 -1 


Then {[3], | 4,]} is a basis for null(AZ — A). 


Alternatively, {[3],| 1,]} is a basis for R? consisting of eigenvectors of A. 


4.7.1 Nonhomogeneous systems 


Recall that {a : Ax = b} is a vector space if and only if b 4 0. 


However, we have the following theorem: 


Theorem 4.7.5. Suppose that x, is a particular solution to this nonhomogeneous system: 
Aty =b and Azp = 0. 


Then any solution of Ax = b can be written as x = %p)+ 2p, where xp, is some solution of 


the associated homogeneous system Ax = 0. 


HW] §2.2: 28, 29 
HW] §4.7: 2, 15, 16, 19-21 
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4.8 Coordinates and isomorphism 


Definition 4.8.1. An isomorphism is an invertible linear transformation T : U > V 
between two vector spaces U and V. The spaces U and V are called isomorphic iff there 


exists an isomorphism between them, in which case one writes U = V. 


Remark 4.8.2. Recall that a transformation is invertible if and only if it is both one-to-one 


and onto. 
e f :X + Y is one-to-one (or injective) iff no two points of X are mapped to the 


same point of Y: 


For 21,22 € X, f(a@1) = f(v2) => r2 = 7. 


e f :X + Y is onto (or surjective) iff for every point of Y, there is some point in X 


that gets mapped to it: 


For every y € Y, there exists « € X such that f(x) = y. 


If f is both injective and surjective, it is called bijective or said to be a bijection. 
Formally, the three properties must be verified in order to check that something is an 


isomorphism. However, we will see that there are shortcuts. 
Theorem 4.8.3. Let U,V,W be vector spaces. 

(a) UV. 

(b) IfU =V, thenV =U. 


(c) IfU=V andV SW, thenU =W. 


Proof. HW (#28) 


Example 4.8.4. P2 is isomorphic to R*. Define T : Pz > R° by T(a+ bt + ct?) = (a,b,c). 
To check that T is linear, let f = a + bt + ct? and g =p+qt+4 rt? and recall 


(a + bt + a) + (p 7 qt es re) 7 (a, b, c) 2s (p, q,1) 


94 Vector Spaces 


Then 


T(f+g9)=T((a+p) + (b+ qt4+(c+r)t?) = (at+p,b+q,c+r) 


T(f)+T(g) = (4,8,c) + (p,4¢,7) = (at+p,b+qa,ctr). 
To check that T is invertible, verify that the inverse is given by T~1(a, b,c) = a + bt + ct?: 


T\(T(a + bt + ct?)) =T1(a,b,c) =a + dt + ct”, 


T(T~*(a,b,c)) =T(a + bt + ct”) = (a,b,c). 


Lemma 4.8.5 (Basis lemma). Let B = {b1,bo,...,bn} be a basis of U. 


(i) IfT :U > V is an isomorphism, then T(B) = {T(b1), T(b2),...,T(bn)} ts @ basis 
of V. 


(ii) Conversely, if {c1,C2,.--;Cn} 18 a basis for V and we define T to be the linear 
transform satisfying T(b;) = c;, then T : U > V is an isomorphism. 


Proof. Let v € V; need to write v as a linear combo of the T(;): 


T 1(v) = 2b, + tobz +--+ +2nbn B is a basis 
v= T (abi + rob a Lnbn) 


v = @1T(b)) + eT (bz) +--+ + nT (bn) T is linear. 


This shows that T(B) is a spanning set. It follows from a homework problem that T(B) is 
independent, so we have a basis. 

For (ii), define T to be the linear map that satisfies T'(b;) = c;; it is immediate (by the 
properties of the basis) that this extends to a linear map from all of U into V. You can 


check injectivity by writing elements of U in terms of the basis {b1, b2,..., 6, } and applying 


T, and surjectivity by writing elements of V in terms of the basis {c1,c2,..., Cn}. 


The point of this lemma is that basically an isomorphism amounts to a renaming. In 


other words, two vector spaces are isomorphic if and only if there is: 


1. a basis B, of U and 
2. a basis By of V and 


3. a bijection between them. 
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Next, we show that every finite-dimensional vector space is isomorphic to R”, i.e., is 


R” “up to a relabeling of the basis vectors”. 


Theorem 4.8.6. V is an n-dimensional real vector space iff V is isomorphic to R”. 


Proof. V is n-dimensional iff it has a basis {v1, v2,..., Un} of n vectors. Define T : V > R” 
by T(v;) = e;, where e; is the standard basis vector of R” having j‘" entry 1 and consisting 
of Os elsewhere. One can check that T is an isomorphism. 


For the converse, the Basis lemma says that {e;,...,¢,} is mapped to a basis of V 


under the isomorphism. 


Corollary 4.8.7. Let U,V be finite-dimensional. Then U = V iff dimU = dimV =n. 


Proof. If U = V, then let {u1,...,un} be a basis for U and let T : U — V be the 
isomorphism between U and V. By the basis lemma, {T(u1),...,7(un)} is a basis for V 
and |{u1,..-,tn}| = |{T(w1),..-, 2 (un) $I. 

If dim U = dim V = n then both are isomorphic to R” by the previous theorem, and 
hence they are isomorphic to each other by transitivity of the equivalence relation © (part 


(iii) of Theorem 4.8.3). 


HW] §4.8: 28, 29, 30 


4.8.1 Coordinates and change of basis 


Suppose that S = {v1,v2,...,Un,} is a basis for V. This is not necessarily an ordered set; 
when setting up a problem, it is often up to you to pick which vector to take as the “first” 


basis vector. 


Definition 4.8.8. We say S = (v1, v2,...,Un) is an ordered basis when the order of the 
elements is important, so that S' is not the same as (v2, U1,...,Un). For an ordered basis, 


given 


U = QV, + GQV2q + +++ +GnUn, 
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we write 


to express u in the coordinate system S. 


For the standard coordinate system, we leave off the subscript, so 


Cl 
C2 
Uu= ' —> WU = Cie, + C2€2 +°++ + Cn€n, 
Cn 
where 
1 0 0 
0 1 0 
a= ,62 = ’ 2en = 
0 0 1 


Example 4.8.9. Recall the homework problem with 8; = (4,1) and 62 = (—2,1). You 


are asked to: 
(i) Show that {61, G2} is a basis for R? and solve (2,5) = a6 + bG2. 
(ii) Find a matrix A such that T,4([$]) = 61 and T'4([9]) = Bo. 
(iii) Find a matrix B such that Tg(41) = [§] and Tg(S2) = [9]. 
(iv) Use Tg to represent the vector (2,5) in terms of the basis {(1, 62}. 
Solutions: 


(i) With what we know now, the quick way is to compute the determinant 


4 -2 


=4-(-2)=6 40 
Le 
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so that det(A) 4 0 implies linear independence of {61, G2}. Since dim R? = 2, this is 


a maximally independent set and hence a basis. Then 


00) | vee 2 tN Gs 3 
af, +bf2=(2,5) <= = ge |] 
i; at b 5 0 eee eee 
(ii) To find A, solve 
a bd 1 4 a bd 0 —2 
= and — 
c d 0 1 c d 1 1 


to find A= [24] = [12] 
(iii) Since Tz does the opposite transform of A, it must be that B = A7!, so 


i | &. °2 
B= 
een Cae 


(iv) Then to represent the vector (2,5) in terms of the basis {1, 82}, compute 


2 i/ 1 2 2 2 
Tp (2,5) =Tp == = 
5 6] 1 4 5 3 
to find that 
2 4 5 
=2 +3 = 28; + 382 
5 1 1 


In terms of the ordered basis S = (1, 82), we would now write 


Definition 4.8.10. Suppose that S = (v1,v2,...,Un) and T = (wi,wWe,...,Wn) are 
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ordered bases for V. 


Denote the vector v € V as written in T-coordinates by 


lulr = ; o V = cCywy + CoW2 +++: + en Wn 


[wy] = ‘ we Wj = A1jV1 + a2;V2 free t AnjUn 


Then 


luls = [crw1 + cowe +--+ + Cans 


= cy [wi]s + colwe]s +--- + cn[wn]s 


Q11 a12 Gin 

a21 a22 a2n 
=C + €2 Fees Cy 

Anl aAn2 ann 
= Psere, 


where Ps. is the n x n matrix with j*® column [w;]s. This matrix Ps—r is the transition 


matrix (or change-of-basis matrix) from the T-basis to the S-basis. 


To compute the change of basis matrix: 


QV, + Agv2 + 43U3 = Wi 


by v1 + bev2 + b303 = We 


C1VU1 + C2QV2Q 1 C303 = W3 
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Each of these equations is a system with augmented matrix [v1 v2 v3 | w,]. Since 
each system has the same coefficient matrix [v1 v2 v3], we can solve all three systems 


simultaneously by row-reducing 


This is an algorithm for how to find Ps_r: 
1. Set up the augmented matrix 
2. Transform to RREF. 


3. The columns on the right side form the columns of Ps_r. 


Example 4.8.11. Consider 


2 1 1 6 4 
S= C= 0 , U2 = 2 7 U3 = 1 and T= = 3 »W2 = —1 »W3 = 
1 0 1 3 3 


eS 
l| 
| 
or Ray ey 
lI 
— 
— 
Na 
w w a 
+ 
— 
bo 
< 
w co 
i 
+ 
1. 
iw) 
WN 
i) ot ot 
= 
y 
| 
| 
NO bo — 


S 
lI 
| 
ao 
II 
ay 
iS 
xS 
- © WD 
+ 
Tr 
oO 
aga. 
Co NY 
+ 
| aon 
— 
Naw 
al 
= 
DH 
| 
| 
ten 


The transition matrix Ps is found from 


2131/6 4 =«5 10 0;2 2 1 
0 2 1/3 -1 5];~]0 1 Of} 1 -1 2 
1 01/3 3 2 0 0 1;1 1 1 
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and we verify 


0 2 4 1 awa 4 
Pserlvjr =|] 1 -1 2 oo le) daas 5 | =[uls 
it. <a.) oh a, i ey, 1 


411 G12 *** Gin 
a21 G22 **: G2n 
A = 
An an2 pre! ann 
is the matrix that sends e; to (a1;,d@2;,...,@n,;), for example: 
0 
Q11 412 Qin ai2 
1 
a21 422 ‘": Q2n a22 
Aes = 0 = 
Anti QAn2 cas Ann 0 An2 


Therefore, A~' can be thought of as the matrix that sends (a1;,@2;,...,@nj;) to e;, and 


hence transforms a given basis into the standard basis. 


Lemma 4.8.13. Let S and T be bases of V, and let Mg be the matrix whose columns are 


elements of S, and let Mr be the matrix whose columns are elements of T. Then 
Pst = Mg'Mr. 


Note that [v]s = Pser[ulr = Mg! Mr|v|r iff Ms[v]s = Mr[v]r = v. For example: 


2 
Aslvu]s = = = 2e; + dep. 


HW] §4.8: 13, 15, 18, 32, 33, 35, 39-41 


4.9 Rank 101 


4.9 Rank 


Definition 4.9.1. For an m x n matrix 


a41 a12 Qin 

a21 a22 °°" a2n 
A= ‘ 

aml Am2 are Amn 


the row space of A is the subspace of R” spanned by the rows, and the column space of A 


is the subspace of R™ spanned by the columns: 


rowspace(A) = span{| G11 412 *** Gin panes) Gm1 Am2 °*** Amn i. iS R", 
@11 Gin 
21 a2n fe 

colspace(A) := span{ er ; }CR 
aml Amn 


Theorem 4.9.2. If A is row-equivalent to B, then rowspace(A) = rowspace(B), and 


similarly for columns. 


Proof. A is row-equivalent to B if and only if B can be obtained from A be some sequence 
of row operations, which means that each row of B is a linear combination of rows of 
A. Thus the rows of A span rowspace(B), so rowspace(B) C rowspace(A). The other 


containment follows BSA. The results for columns also follows BSA. | 


Suppose we apply this theorem to the case when B is the RREF of A. Then: 


1. A and B are clearly row-equivalent, so the rows of B span rowspace(A), and 


2. the nonzero rows of B are linearly independent, so B is actually a basis for 


rowspace(A). 
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Example 4.9.3. Find a basis for the subspace of R° spanned by 


1 3 2 —1 
—2 2 3 2 
w1= 0 sv2= | 8 | ,v3=] 7 ],¥v= 0 
3 1 2 4 
—4 4 3 —3 
Solution: 
| 1 -2 0 3 -4 ] | 102 0 1 ] 
3 2 8 1 4 0110 41 
A — ~ — B 
2 3. UP OD WS: 0001 -1 
-1 2 04 -8 000 0 O 
So one basis for span{v1, v2, v3, V4} is 
1 0 0 
0 1 0 
B=<¢b=] 2 |,b2=] 1] ,b3= 0 
0 0 1 
1 1 —1 


Definition 4.9.4. The dimension of rowspace(A) is called the row rank of A (and similarly 


for the column space). 


From the theorem, we know that if A and B are row-equivalent, then they have the 


same row rank, etc. 


Corollary 4.9.5. If B is the RREF of A, then the row rank of A is equal to the number 


of nonzero rows of B. 
In the previous example, we found that the row rank of A is 3. 


Theorem 4.9.6. The row rank and the column rank of A are equal. 


Proof. Let B be the RREF of A. The row rank of A is k < n if and only if the columns 


of B include the first k standard basis vectors {e1,...,e,}. Note that any other nonzero 


column is a linear comb of these. Thus the column rank of A is also k. 
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Since row rank and column rank are equal, the following definition makes sense: 


Definition 4.9.7. The rank of A is the row rank (or equivalently, the column rank) of A 
and is denoted rank(A). 


Theorem 4.9.8 (Rank theorem). If A is an m x n matrix, then rank A + nullity A =n. 


Let T,4 : R" > R”™ be the linear transformation given by T(x) = Ax. Recall that we 


proved ker 74 = null A. Now we can also see: 
nullity A = dim ker T'4 and rank A = dimranT 4. 
If the range has dimension r, then the Rank theorem just asserts that 
n=(n—r)+r. 


Roughly speaking, this means that for any given A, R” decomposes into a direct sum 
(or Cartesian product) of the subspace of vectors which is killed by A and the subspace of 
vectors which is not killed by A: 


R” = nullA @ rowspace A. 


More precisely, given a matrix A, there is 

e a (n-—r)-dimensional subspace K = null A = ker Ty, 

e an r-dimensional subspace L = rowspace A = T, | (ran Ta) on which T', is invertible, 
e every vector in K is orthogonal to every vector in K+. 


This means that any x € R” can be written uniquely as « = ay+ bz with y © K andz € L, 
and y: z= 0. Note that null(A)M rowspace(A) = {0}. 


Example 4.9.9. Let 
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Then rank A = 2 and 3 = dimR® = rank A + nullity A = 2+1. So rowspace A is a 
(2-dimensional) plane passing through the origin of R? and null A is a line passing through 
the origin of R? which meets rowspace A orthogonally. 


From B, a basis for rowspace A is given by 


1 0 
O},] 1 
1 1 


—t 
Az =0 = BN ellis 
t 
so a basis for null A is given by 
-1 
-1 
1 
These are orthogonal subspaces: 
1 0 -1 a —t 
a} 0 | +6 t} -1]/=| 08 —t | = (a)(—t) + (6)(-t) + (a+ 5)%) = 0 
1 1 1 a+b t 


The rank theorem states that rank A+ nullity A =n. In the case when A is n x n, note 
that null A = 0 if and only if Ax = 0 has only the trivial solution z = 0. This means we 


have an update: 

Theorem 4.9.10 (Characterization of Invertibility). The following are equivalent: 
1) A is invertible (i.e., A~+ exists). 

2) A can be written as the product of elementary matrices. 


3) The RREF of A is I (so A is row equivalent to I). 


4) The system Ax =b has exactly one solution. 
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(10 


The system Ax = 0 has only the trivial solution x = 0. 
det A #0. 

The rows/columns of A are linearly independent vectors. 
The rows/columns of A span R”. 

nullity(A) = 0. 


rank(A) =n. 


HW] §4.9: 1, 5, 13, 15, 39, 40 


For #15, note that you can use column operations also. 
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Chapter 5 


Length and direction 


5.1 Vector arithmetic and norms 


Recall from matrices: 

Theorem 5.1.1. Jf u,v,w are vectors of the same length, and k,@ € R, then 
(i) u+v=v+t+u4, 
(it) u+(vtw)=(u+tv)+w=utvt+u, 


(iii) u+0=0+u=u4, 


(iv) w+ (-u) =0, 
(v) (Cu) = (ke)u, 
(vi) (kb + Qu = ku + fu, 
(vii) k(u+v) =ku+kv, and 
(viii) lu =u. 


Definition 5.1.2. The norm of a vector u = (ui, u2) € R? is its length 


ull = ud + wd. 


In general, the norm of a vector u = (u1,...,Un) € R” is 


ull == (> “ye 
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The norm of a sequence {uy }92, = (ui, u2,-..) € RY is 


ss 1/2 
[ul] == (3) 
i=1 


The norm of a function uz = u(x) € R® is 


ali (/ u(e)* dr) a 


NOTE: |] - || : R” — [0, co) is a function on vectors. It is actually a continuous function. 


In fact, it is differentiable everywhere except at x = 0. 
Theorem 5.1.3. For x,y € R” and any scalar k, 

(i) ||x|| 2 0. 

(it) |x|] =0 iffa =0. 
(iti) ||k 


8 


| = IF] lal 


(iv) ||w@ + yll < lll + Ilyl]. (Triangle ineq) 


Proof of (i)—(iii). Homework. 


Proof of (iv). Postponed. 


5.1.1 Distance and length 


The distance between two points is the size of the space between them, i.e., the length of 


the vector connecting them. 


Definition 5.1.4. For x,y € R”, the distance from « to y is the length of x — y. 


dist(w,y) = |lv — yll = ya? +. +22. 


NOTE: sometimes it is not easy working with ||z|| because of the square root. In this 


case, use ||x||? = a274+---+ 22. 


Example 5.1.5. The surface defined by x7 + 23 = x3 is a paraboloid. The surface defined 
by yi + y2 + y3 = 0 is a plane. What is the closest point on the plane to the paraboloid? 
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Solution. You need to minimize ||x — y||, where x is on the paraboloid and y is on the 


plane. However, \/- is an increasing function, and hence order-preserving. This means that 


it is enough to find x,y minimizing ||x — y||?. 


5.1.2 Dot products and projections 


We have another function mapping vectors to numbers, but this one actually takes TWO 


vectors. 


Definition 5.1.6. If x,y € R”, the dot product of x and y is 


n 
i=l 


Sometimes this is called the inner product and written (x,y) or (aly). 
The dot product is also a differentiable function R" x R” — R (even at 0!); it is just a 


polynomial in 2n variables. 


Theorem 5.1.7. ||x|| = /a-<. 


Proof. HW. 


Recall the law of cosines: if a,b,c are the side lengths of a triangle and @ is the angle 
opposite c, then 


ce? = a? + b? — 2abcos 0. 
If @= 5, then c is the hypotenuse of a right triangle: Pythagorean theorem! 
Theorem 5.1.8. If x,y € R", «-y := ||2||||y|| cos0, where 0 is the angle between them. 


Proof. For x,y € R?, the law of cosines gives 


lly — all? = Ile? + llyll? — 2lle[llly|| cos @ 


lIxlIllyll cos 8 = 5 ([l211? + Ilyll? — lly — 2?) 


n n n 
a (so See «) 
i=l i=l 


i=l 


n n n 
= 3 (Soa8+ Soak Soe — ae +2) 
i=1 t=1 


i=l 
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=5 ( nan) 
i=l 
=X-y. 
ay 


Theorem 5.1.9. The angle 0 € [0,7) between x and y is given by cos? = Tenlol : 


Corollary 5.1.10. The sign of x-y corresponds to the angle between them: 


0 is obtuse r-y<0 
0 is 5 => z-y=0 
6 is acute xz:y>O0. 


Definition 5.1.11. x and y are orthogonal iff x -y = 0. (From second part above.) 


5.1.3 Arithmetic of the Dot Product 


Theorem 5.1.12. [fu,v,w are vectors and k is a scalar, then 
1. (Commutativity) u-v =v-u. 
2. (Distributivity) u-(v+w)=u-v+u-w. 
3. (Scalar associativity) k(u-v) = (ku) -v. 
4. (Positive definite) v-v > 0 iffu 40, andv-v=0 iffv =0. 


The last one should be clear from the earlier theorem: 


v:-u>0 => Vu-v=lvl| > 0. 


How about: 
l. u-(v-w) =(u-v)-w? 
2. Given u 4 0, is there a unique 1 such that u- 1 = u? 


3. Given u 4 0, is there a unique v such that u-v = 1? 
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Also: before doing the problems from the text, obtain the following results so that you 


can use them to simplify your computations: 
(i) Prove that ||x|| > 0 for every x € R”. 


(ii) For « € R”, prove that ||a|| = 0 iff ¢ = 0. 


(iii) For « € R”, prove that ||z|| = || — ||. 


(iv) Describe the geometric meaning of (ii) and (iii). 


5.1.4 Projections 


We want to decompose a vector in terms of others, for example, to express in terms of a 


given basis. 


Example 5.1.13. Let u = (2,—3). Consider the standard basis vectors e,; = (1,0) and 
eg = (0, Ly, 


Now u = (u-e1)e1 + (w-eg)eg = 2e; — 3e2 is a decomposition of u into a linear combination 


of e; and eg. 


ue, tells you how long u is in the e;-direction, i.e., the component of u that is parallel 


to e;. In general, 


This can be more complicated if vectors other than e; and e€2 are used. 


Example 5.1.14. Let u = (2,—3) again, but let « = (4,3) and y = (2, 2). 


u-a = (2)(4) + (-3)(3) =8-9=-1 


u-y = (2)(2) + (-3)(2) =4-6 =—-2. 
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Now 


(u-a)a + (u-y)y = (—l)a + (—2)y 


This isn’t even in the same direction as u! What happened? In the first example, we had 


x-y = (1)(0) + (0)(1) =0, ie. the basis vector were orthogonal (perp). 


Example 5.1.15. Let u = (2, —3) again, but let « = (1,1) and y = (—1,1). Then 


so x and y are perpendicular. 


u-x = (2)(1) + (-3)(1) =2-3=-1 


u-y = (2)(-1) + (-3)(1) = -2-3=-—5. 


Now 


(u-x)a+(u-y)y = (—1)x + (—5)y = (-1,-1) + (5, -5) = (4, -6) = 2u 


This is a scalar multiple of u! Why not just u? 


Ill], lyll A 1. Now: 


Opa taal ae eh Vie es | eee ees 
[22 (ye? 2°" 2%" 22 3° a) lag) =% 


Definition 5.1.16. Let {x,y} be a basis for R”. 


If c- y =0, then {x,y} is an orthogonal basis for R?. 
If x- y =0 and ||2|| = ||y|| = 1, then {2, y} is an orthonormal basis for R?. 


Theorem 5.1.17. The length of u in the direction of x is “5 = “=. Therefore, the 


(orthogonal) projection of u onto (the line spanned by) x is 


Proj, Ui= 2. 
Ila" 
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Note: this is a scalar mult of x, so certainly in the same direction. 


(az £ 0 or it doesn’t make sense.) 


Corollary 5.1.18 (Orthogonal decomposition). The component of u which is orthogonal 


to x is 
i U:z@ 
Ee eae 


This is helpful if you have only one vector xz, but not a basis. In fact, it produces a 
basis 


{2, U— proj, u} 


Proof of the theorem and the corollary. Define 


W1 != proj, u 


W2 i= U— PIOj, U. 
Now wg is orthogonal to « because 


we: x = (u— proj, u) x 


Ure 
=uU-fL-——sn-u 

||| 
=u'L—-Uu-e gz: = |lal?. 


Now since wz) is in the direction of x, we know w, = ka for some k. To find k, 


uw=witu,g=ket+tw. = u-x=(kx+wo)-2 


=k(a-x)+wo-a dot prod arith 


= kla||? + we 2 ex = |l2||? 
= kal? we: 2 =0 
Ura 
k — 
|||? 
This shows that the projection onto x is 
oj k ee 
proj, u = wy = kx = —JZu. 
. I|2||? 
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Corollary 5.1.19. Jf {u;}%_, is an orthogonal basis of R", then for any u € R”, 


n n UU; 
u= SS biol. u= >a Tae 
i=1 ad 


Tf {u;}%_, ts an orthonormal basis of R”, then for any u € R”, 


n n 


U= Se proj, u= wc Ui )U;- 


i=1 i=1 
Recall that a basis allows us to break a vector into parts and deal with each part 
separately: 
T4(u) = Au= Ayu Uj )0; = Sou Uj) AY; = wc - Ui) T'4(v). 


i=l i=l i=l 


We saw this formula before, but orthogonality or orthonormality means that now the 


coefficients are given more explicitly (provided {v;} is given). 


HW] §5.1 


(i) Find an orthonormal basis for the subspace of R® consisting of vectors of the form 


(a,a + b,b). 


(ii) Find the projections of each vector onto each of the other two: 


5 4 3 

5 2 1 
a= : b= : c= 

—5 —4 —3 

5 8 9 


(iii) Find an orthonormal basis for R? that contains a vector parallel to (1, 1) and write each 
of the following vectors in terms of it: a = (—2,1),b = (3,4), e, = (1,0), e2 = (0, 1). 


Also, sketch the orthogonal decomposition of each vector, in the basis you found. 


5.2 Cross Product 


Skip. 
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5.3 Inner product spaces 


Recall the properties of the dot product: 
Theorem 5.3.1. Jf u,v,w are vectors and k is a scalar, then 
1. (Commutativity) u-v =v-u. 
2. (Positive definite) v-v > 0 iffv #0, andv-v=0 iffv =0. 
3. (Distributivity) u-(u+w)=u-utu-w. 
4. (Scalar associativity) k(u-v) = (ku) -v. 
We now use this to define general inner products axiomatically: 


Definition 5.3.2. If V is a real vector space, then an inner product on V is a symmetric 
positive semidefinite bilinear mapping V x V —> R. In other words, any function that 


assigns each pair of vectors u,v to a real number (u,v), in such a way that 
1. (Symmetry) (u,v) = (uv, u). 
2. (Positive definite) (v,v) > 0 iff v £0, and (u,v) = 0 iff v =0. 
3. (Linearity) (u,av + bw) = a(u,v) + b(u, w), for a,b € R. 


Definition 5.3.3. An inner product space is a vector space V, equipped with an inner 


product. 


Example 5.3.4. R” is an inner product space with the dot product, which is also called 


the standard inner product. In this case, the inner product is given by a matrix product 


U1 

n V2 a 

(u,v) =u-v=5 wis = | U2 ... Un : =U vU 
i=l : 
Un 


If C is a matrix satisfying certain properties, R” can also be equipped with a nonstandard 


inner product 


lites ae Ge ae] 


. C21 C22 bas Can v2 
(u,v)c =U Co=| uw ug un | 


Cn1i Cn2 Peres Cnn Un 
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We’ll see that in order for C to give an inner product in this way: 
1. (Symmetry) (u,v)c = (v, u)c iff C= C7. 
2. (Positive definite) (u,v)c is positive definite iff vu’ Cu > 0 whenever v # 0. 
3. (Linearity) This follows automatically from the linearity of matrix multiplication. 


One important way to find a symmetric and positive definite matrix C is by using an 


ordered basis S = (uy, U2,...,Un): Given S, define C by cj; = (ui, u;). 


Theorem 5.3.5. Let S = (u1,...,Un) be an ordered basis for an inner product space V 


and define C = [ci;| by ciy = (ui, uz). Then 
(i) C is symmetric. 
(ii) C determines (v,w) for every v,w € V in the sense that (v, w) = [v]§C[w]s. 


Proof. (i) This is straightforward: c,; = (uj, uj) = (uj, Ui) = Cji- 


(ii) Write v and w in terms of the basis as 


n 
v= y AzU; and w= y bj uy, 


i=1 j=l 


so that [v]g = (a1,...,@,) and [w]g = (b1,...,bn). Then 


(v, w) = =P ayUi, yy Be) 


Example 5.3.6. Let S = (e1, €2,...,@n) be the standard basis on R". Then 


0, i409, 


so that C' defined by c;; = (e;,e;) is C =I. Then 
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Theorem 5.3.7. In R", Av-y=a-ATy andx- Ay= Atx-y, for any nx n matrix A. 
y y y ¥ y 


Proof. (Az,y) = (y, Ax) = y7Ax = (yTA)a = (ATy)?a = (ATy, 2) = (2, ATy), and 


similarly for the other one. 


Example 5.3.8. If C is defined in terms of a basis as above, then it is guaranteed to be 
positive definite. In general, however, it can be tricky to check that a given matrix C' is 


positive definite. To do this, one would compute as follows: suppose 


C= 
1 2 
Then 
TOr = Bak 71 — 2? ne) 2 2 
x T=! 4, X = vy + Qayaq + 2a5 =a, +254+ (@14+22)*, 
2 Le 


which is strictly positive iff 4 0. 


Example 5.3.9. Let V = C(0,1) be the vector space of all continuous function on the 
closed unit interval [0,1]. For f,g € V, define 


1 
(fa) =f roa(e) at, 
0 
This makes C'(0,1) into an inner product space, as we now check: 


1. (Symmetry) (u,v) = (uv, 4). 


This boils down to the commutativity of multiplication for real numbers: 
1 1 
[ soswa=f goreat 
0 0 


2. (Positive definite) (v,v) > 0 iff v £0, and (u,v) = 0 iff v =0. 


This is immediate from basic properties of the integral: 


f(t)? >0 => [ sora oa. 


Also, if f is nonzero anywhere, then f is nonzero on some interval [a,b] © [0,1] 


118 Length and direction 


(because f is continuous). Consequently, f(t)? >e¢> 0 fora <t< band 


‘ b b 
| f(t)? dt > | fed > i edt =e(b—a)>0. 
0 a a 
For the converse, it is clear that the integral of the zero function is 0. 


3. (Linearity) This is immediate from the linearity properties of integrals: 
1 1 1 
| (af (t) + bg(t)) dt = af f(t) dt+ bf g(t) dt 
0 0 0 


Example 5.3.10. A standard and classical use of nonstandard inner products in the 


above context is to replace 
ay = | “Flglt) a 
with 
(a) = | *Flég(t w(t) at 


where w(t) > 0 is weight function. For example, a probability density function. 
Definition 5.3.11. A norm on a vector space V is a function V > Rt which satisfies: 
(i) (Positive semidefiniteness) ||u|| > 0, and ||u|| = 0 iff u = 0. 
(ii) |JAul| = || - [lel]. 
(iii) (Triangle inequality) ||w+ v|| < |]ul| + ll]. 


A norm does not always have a corresponding inner product, but if you are given an 


inner product, you can always define a norm in terms of it. 


Lemma 5.3.12. If V is an inner product space, then a norm can be defined on V by 
lull <= Vu, u). 


Proof. HW. For the triangle inequality, use Cauchy-Schwarz inequality: |(x, y)| < ||2|||lyl]. 


Theorem 5.3.13. Suppose (V, (-,-)c) is an inner product space, where the matriz C = [c;; 


is defined by ci; = (ui, uj), for some ONB S = (ui, U2,...,Un). The for vectors expressed 
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in terms of this ONB, the inner product behaves just like the dot product, 1.e., 


v= SO ajui, w — So bis => (v,w)o = S- aibi. 


Proof. With respect to the basis S, the matrix C' = [c;;] defined by cj; = (ui, uj) is just 


the identity matrix I, so 


HW] §5.3: #2, 3, 4, 10, 20, 24, 41, 48 


Let C'(a,b) be the set of continuously differentiable functions on [a, J: 


C'(a,b) = {f : [a,b] > R: f’ exists and is continuous on [a, b]}. 
Let C4(a,b) be the subset of C1 (a,b) consisting of functions which vanish at the endpoints: 
Co (a,b) = {f € C*(a,b) : f(a) = f(b) = 0}. 
Consider the transformation 
D:C'(a,b) > C°(a,b) ~~ defined by D(f) =f’. 


(1) Check that C1(a, b) is a vector space, and that Cj (a, b) is a subspace of C'(a, b). 
(2) Check that D is a linear transformation. 


(3) Check that D is a antisymmetric transformation, in the sense that 
(f, Dg) = =(D fg), 


where (f,g) = J” f(t)g(t) dt. 


4) Explain why the property in the previous part agrees with the meaning of “antisym- 
y y & 


metric” as we apply it to matrices. 
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5.3.1 Discursion relating to ODE and many other applications 


There are some technical issues that arise with infinite-dimensional vector spaces. Consider 


the example of the vector space of sequences RN: this is infinite-dimensional with basis 


7 l, j=k 
(eis-6o,@3y 1.5) (6, 45 (€;)k = ej;(k) = 
0, else. 


The definition of the norm given here may not exist for all sequences. 


Example 5.3.14. For u = (1,1,1,...) € RY, 


n 


|| 20/7 you pa jim SY) 1= lim n= oo. 


i=l 


This limit doesn’t exist, so neither does ||u||! 


Conclusion: RN is too big. Instead, consider just the sequences with finite norm: 
@(N) := {u € RN: |lul| < oo}. 


Further complication: with infinite-dimensional vector spaces, there are different norms: 


the norm may determine whether or not something is in the vector space. For example: 


é1(N) := {u € RN: |lull|a < oo}, IJeclla = So Jul. 


i=1 


Then one can show the strict containment ¢'(N) C ¢?(N). For example, consider 


Then you have 


n=1 n=1 


se ME es ; as 7 
llull2 = (4) = a5 but esa = So fund = So > = 00. 
n=l n=1 


What this means in terms of the normed vector spaces ¢?(N) and ¢1(N), is that 


ucV(N) but ue é'(N). 
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In general, for 1 < p < ov, the vector spaces 


eS 1/p 
éP(N) = {ue RY: |lullp < oo}, |lullp = (>: rt) 


come up often, and are well-studied in a variety of applications. (Note that p,q do not 


need to be integers!) The limiting case is also well-studied: 
£%°(N) := {u € RN: llulloo < co}, ||ulloo = sup |u|. 


For 1 <p<q<o, one has a strict containment ¢?(N) C ¢9(N). 


Remark 5.3.15. Of all the normed vector spaces ¢?(N), for 1 < p < ov, the only one of 
these norms that has an associated inner product (i.e., that can be defined in terms of an 


inner product via ||u|| = \/(, u, u)) is the case p = 2. 
Theorem 5.3.16 (Cauchy-Schwarz Ineq). In any inner product space, |(x,y)| < ||a||||yl]. 


Alternative form in R”: if you square both sides, this reads 


(+) (Ex) (Ey) 


The proof requires a lemma about numbers. 


Lemma 5.3.17. Let a,b,c be fixed (real) numbers with b,c > 0, and let t > 0 be a real 
variable. Then 


0 < b—2ta+t’c, Vt = a? < be. 


Proof. First, suppose c = 0. Then for every t € R, we have 
2ta <b = a< xb. 


Since this is true for ANY t, must have a = 0. 


Now suppose c # 0. Since we are assuming the inequality to be true for ANY tf, it is 


true for t = £, in which case 


c? 


2 2 
O<b-2UWat+tSe = fsb = a’ < be. 


Proof of Cauchy-Schwarz in R" . Let t > 0 be a variable again. Then )7"_,(a:—ty;)? > 0, 
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because it is a sum of nonnegative numbers. Expanding the summand, 


0< > (a — ty)? 
i=1 


n 
= So (aj — 2ty? + ty?) FOIL 
i=l 
n n n 
_ s oe — [S Qtriys + S- ty? rearrange 
i=l =I eI 
n n n 
= oy ay — 2t S- ay +t? S- yy factor out 
pa =1 = 
= b—2ta+i?’e, 


where b = $7, 2? > Oand c= 0, y? > 0 and a= Soi, zy; may be any number. By 


the lemma, a? < bc, and hence 


(-*) <(E) (Ex), 


Theorem 5.3.18 (Triangle Inequality). ||xz + y|| < |||] + |lyll- 


Proof. We work with the squares and take roots at the end. 


lIe+yll? =(@t+y,2+y) 


= (x, 2) ae 2(x,y) an (yy) 


= |lell? + 2¢x, y) + llyll? lull? = (u, x) 

S |lerl|? + 2\(x,y)| + Hell? u Su 

S |lel|? + 2lla|lIlyll + lel? Cauchy-Schwartz 
= ([lerl] + Ilyll)? 


Ila + yll < [lel] + IlyIl- 


Corollary 5.3.19 (Pythagoras). Let x,y be orthogonal. Then ||x + y||? = |||? + |lyll?- 


Proof. If x,y orthogonal, then («,y) = 0, so from the previous proof, 


ll + yll? = [lal]? + 2¢e, y) + lly]? = [lell? + llyll?- 


Theorem 5.3.20 (Polarization identity). For 2,y € R”, 4(x,y) = |lx + y||? — lla — yl]. 
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Proof. FOIL out |la + yl|? and ||a — y|[?: 


c+ yl? = (ety, +y) = lal]? + 2(@,y) + [lyll? 


llz — yll? = (@ —y, 2 —y) = |lall? — 2(x, 9) + [lyll?. 


Subtracting the second from the first gives the identity. 


The polarization identity allows you to express a dot product in terms of norms. 


Theorem 5.3.21 (Parallelogram law). For x,y € R”, ||x||? +||y||? = llja+ yl|? — lla —y| 


Proof. HW. 


Think about this diagram when studying the Parallelogram and Polarization identities: 


HW] §5.3: 4416, 33, 34, 35, 40, 42, 46 


Keep the proof of the polarization identity in mind when doing #16. 
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5.4 The Gram-Schmidt Algorithm 


Corollary 5.4.1. If {vu;}?_, is an orthonormal basis of the inner product space (R”, (,-,-)), 


then for any u€ R”, 


Example 5.4.2. Let 


2 2 Bh 

3 3 3 3 

U= 4 ’ and Uu= 5 »V2 = ‘ » U3 = : 
L 2 2 

5 3 —3 3 


One can check that {v1,v2,v3} is orthonormal, so there is a unique way to write u as a 
linear combination of the v;s. 


Without the theorem, you’d need to solve a system of 3 equations in 3 unknowns: 


U = C1 V1 + C2V2 + C3U3. 


With the theorem, you only need to compute some inner products: 


ce = (u,v) = $-§ +3 =1, 
co = (u, v2) =$+93-P=0, 
cg = (u, v3) = 2+8+ 2 =7, 


to find U = V1 + TV3 |. 


The Gram-Schmidt process is an algorithm which removes linear dependency from a 
set of given vectors in an inner product space. In particular, this means it can be used to 


convert a generic basis into an orthonormal basis. 


Definition 5.4.3. When U = span{u1,...,ux} is a subspace of a vector space V, and 
{ui,...,Ux} are orthogonal vectors, 
a ae (v, Un) 
PN Nigest ti 


So v = projy v + (v — projy v), where projy v € U and projy uv L (v — projy v). 
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Theorem 5.4.4 (Gram-Schmidt). Let V be an inner product space and let W be a nonzero 


m-dimensional subspace of V. Then there exists an orthonormal basis for W. 


Proof. Since every vector space has a basis, let S' = {u1,u2,...,Um} be any basis for W. 


We construct a new basis, using these vectors, as follows: 
(1) Define vy := wu. 


(2) Use projections to decompose uz into its components which are parallel to and 


orthogonal to v1 = uy: 
U2 = proj,, Ua + (uz — proj, uz) ; 


and then define 


Ug := U2Q — pro), Ug = U2 — 


Note that vg 4 0, since {u;}%, is a basis of W, and is hence linearly independent. 


Also, note that v2 is orthogonal to v; by construction: 


(3) Use projections to decompose ug3 into its components which lie in span{v,, v2} and are 


orthogonal to span{v1, v2}: 


U3 = PLOlspantay ost u3 + (us = Pile paulo us) ’ 


and then define 


U3 >= U3 — PLO) span{v1,v2} U3. 


Again, note that v3 4 0, and that v3 is orthogonal to both v1 and v2 by construction. 
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(4) Continue to construct the remaining basis elements via 


Uk := Uk — PLO) span{v1 syVk—1} Uk; 


until you run out of ups (when k = m). 


By construction, this gives an orthogonal basis {v;,}”, of W. For the final step, define 


to obtain an orthonormal basis {w;}?,. 


Example 5.4.5. Find an ONB for the subspace of R® spanned by {z, y, z}, where 


1 2 3 
e=|-1|,y=|] o|,2=| -3 
0 =) 3 


(Note that these are independent.) 


First: 
1 
y=x£=)] -1 
0 
Second: compute v2 = Y — PrOjgpan{v,} Y- 
1 1 
? 2+0+4 
v2 =y—proj,, y=] 0] - oo ie a ee 
+ (-1)? 
—2 0 —2 
Check that vg L vi: (vg,u1) =1-—1+0=0. 
Third: compute v3 = 2 — pr0jgpan{v,,v2} 2 
3 1 1 3-—3+1 
v Z—proj,, 2 — proj,,, z 3 ceed 1 ce 1 34+34+1 
SBE gs DEO gy AO ads | epee |) Yaad = 3 = 34 
v2 v6 
=2 3-0-2 


Check that v3 L span{v1, v2}: (v3, v1) = 1—1+0=0 and (v3, ve) =1+1—2=0. 
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So {v1, v2, v3} is an orthogonal basis. To normalize and obtain an ONB, take 


1 1 1 

: 1 I 1 : 1 
wy = =] - ? W2 = — 5 wW3 = — 
mee) V6 eae e 

0 —2 1 


HW] §5.4: #7, 8, 10, 11, 14, 18, 31-33 
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5.5 Orthogonal complements 


Definition 5.5.1. Let W be a subspace of an inner product space V. A vector u € V is 


orthogonal to W iff it is orthogonal to every vector in W, i.e., 
wew = (u,w) = 0. 


The set of all vectors in V that are orthogonal to W is called the orthogonal complement 


of W and written 
Wt :={veViv LW}. 


Theorem 5.5.2. Let W be a subspace of an inner product space V. Then 
(1) W*+ is a subspace of V. 
(2) WnW?+ = {0}. 
In cases when V is finite-dimensional, we also have 
(3) V=Wewt. 
(44) Wr)" =W. 


Proof. Part (1) already appeared in the HW: suppose that 2,y €¢ W+. Then 
wEeWwW =  (w,ar+t by) =alw,r) +b(w,y) =a-0+b-0=0. 


For part (2), ifu€¢ WNW+, then wu is orthogonal to itself: (u,u) = 0. But then 
||u|| = 0, so u = 0. 

For part (3), let {w;}#_, by a basis of W, which we can take to be orthonormal by 
Gram-Schmidt. For v € V, take u = v — projy, v. Then 


k 
(u,wi) = (v — projy v, wi) = (v,wi) — (Soto 


j=l 
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which shows that u is orthogonal to every basis vector of W, and hence also to every linear 
combination of basis vectors. In other words, u is orthogonal to everything in W. So 
ueWwt. 

For part (4), if w € W, then w is orthogonal to every u € Wt, so w € (W+)+. This 
shows that W is a subspace of (W+)+. To see that (W+)+ is a subspace of W, pick any 
v € (W+)+ and show that v € W. By part (3), we can write v as vu =w+u with we W 


and u € W+, so we'll have v € W if we can show u = 0: 


(u, u) = (u,v — w) = (u,v) — (u,w) =0-0=0. 


Theorem 5.5.3 (Fundamental Theorem of Linear Algebra). Let A be an m x n matric. 
(a) The null space of A is the orthogonal complement of the row space of A. 


(b) The null space of A? is the orthogonal complement of the column space of A. 


R” = colsp(A) @ null(AT 
IR” = rowsp(A) © null(A) p(A) (4°) 


rowsp(A) = ran(A!) AG =h 
rank(A) = r = dim rowsp(A) colsp(A) = ran(A) 
aD rank(A) = r = dim colsp(A) 
Pax, +x; Ax, = 0 Matt) 
A <I 
pe 
2 nullity(A") = m-r 
nullity(A) =n-r 


Theorem 5.5.4. Let W be a finite-dimensional subspace of the inner product space V, 


and pick vu € V. The closest point of W to v is projy v, 1.e., 
|v — projyy vl] = min |v — wl]. 
Proof. Letting w € W and decomposing v — w according to V = W @ W+, we have 


v— w = (projyw v — w) + (uv — projy v) 
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with projy v —w) € W and (v — projy v) € W+. Since these are orthogonal, Pythagoras 
says that 


||v — wll? = || projy v — wll? + |v — projy vl’. 


The choice of w that minimizes this expression is w = projy v, which leaves ||v — w||? = 


||v — projy v||?. The conclusion follows by taking square roots. 


HW] §5.5: #4, 10, 11, 13, 18, 22, 26, 28 


Chapter 7 


Eigenvalues and eigenvectors 


7.1 Eigenvalues 


Definition 7.1.1. Let L: V > V be a linear operator on a vector space V. The number 


A is called an eigenvalue of L iff there is a nonzero vector x € V such that 
L(x) = Ax. 


Any x satisfying this equation is an eigenvector of L for the eigenvalue X. 


The eigenspace of X is 
E)  :=span{xz: Ax = Ax}. 


Note that « #0. Otherwise, D(a) = L(0) = 0 = XO for any and the definition would 
be meaningless. 


Any scalar multiple of an eigenvector is also an eigenvector: 


L(cx) = cL (x) = cAt = X- (cx) 


Theorem 7.1.2. [If dimV =n and S = {vp}?_, is a basis for V, then 


L(x) = Ag, 


where then x n matrit A has k*” column equal to the S-coordinates of L(vx). 
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Note, however, that the eigenvalues are independent of any choice of basis. 


7.1.1 Finding eigenvalues 
To go about finding eigenvalues: 
D(a) = rx => At — Ax = 0 => (AI — A)x = 0. 


This system always has the trivial solution « = 0, but we are interested exclusively in 


nonzero solutions when looking for eigenvalues, which means we need to find A that satisfy 
det(AZ — A) = 0. 


Example 7.1.3. Suppose L([}]) = [4] and L((?]) = [7]. Find the eigenvalues and vectors 
of L. 


In the standard basis, L(x) = Ax for 


1 2 
A= ; 
2 1 
so find eigenvalues via 
A-1 —2 ‘ F 
det(AT — A) = =(A-1)° —-4=X - 24-3 = (A+ 1)(A- 3) 
—2 r»-1 


so the eigenvalues are \ = —1,3. 
Now, to find the eigenvectors. 
For \ = —1, solve the homogeneous system (AI — A)a = 0: 
-1-1 —2 —2 -2 1 1 -1 


— ~ = t+ 22 0 x 
—2 -1-1 —2 -2 0 0 1 


For = 3, solve the homogeneous system (AI — A)a = 0: 


= ~ = L1— % 0 x 


7.1 Eigenvalues 133 


Check that these are eigenvectors: 


1 2 1 1-2 =1 dL 
A=-1 - = =(-1) 

2 1 —1 2-1 1 —1 

Lro2 1 142 3 1 
A=3 = = = (3) 

Oe \|er 241 3 1 


Plotting these vectors before and after multiplication by A, one sees that (—1,1) “flips” 
and (1,1) is scaled by a factor of 3. 


Example 7.1.4. Solve the eigenproblem for 


| 4 2 2 | 

A= 2a s4 

0. .0 3 
pear 2 2 

det(AI — A) = ames et 4 | =A®— 6? + 5A+12+0+0-—0-—0- (12—4)) 
0 0 A=3 
= MA? — 61 +9) 
= X(A- 3)" 


so the eigenvalues are \ = 0,3,3. Here 3 is an eigenvalue of multiplicity 2. 


It is important to record the multiplicity of the eigenvalues! 


Now, to find the eigenvectors. 


For \ = 0, solve the homogeneous system (AI — A)a = 0: 


-4 -2 -2 -4 -2 0 2 1 0 
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This gives x3 = 0 and 2%; + 2 = 0, whence an eigenvector is any multiple of 


For \ = 3, solve the homogeneous system (AI — A)a = 0: 


Sab 2 2 be Qe 2 
2 4 A ~!0 0 0 
0 0 0 0 0 0 


This gives 71 + 2s + 2t = 0, whence an eigenvector is any vector of the form 


—2s —2t —2 —2 
r= S =8 1 +t 0 
t 0 1 


—2 —2 
1 /,] 0 
0 1 
Check that these are eigenvectors: 
4 2 2 1 4-—4+0 0 1 
A=0 ae eee 2}|=| -2+2+0 |=] 0 | =(0)| -2 
0 O 8 0 0+0+0 0 0 
4. 2 2 —2 —8+2+4+0 —6 —2 
A=3 a9) Sif ad 1 |={ 4-14+0 |=] 3 |=@)]} 1 
0 O 8 0 0+0+0 0 0 
4-52. 22 —2 —84+2+4+0 —6 —2 
-2 -1 -4 0 |=] 0+0+0 |=] 0 | =(3)] 0 
0 O 8 1 3+0+0 3 1 


Geometric interpretation: there is a projection. The eigenvector for \ = 0 is a basis for 
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null(A). You can see that it is orthogonal to the rowspace of A: 


de 2. 12 2 1 0 2 0 
A= 2-1 -4/]~/]0 01 = rowspace(A) = span 11,10 
GO 4) 38 0 0 0 0 1 


Note that this basis for null(A) has nothing to do with the eigenvectors of A except that 


they are orthogonal to Ep (the similarity to the basis vectors of F3 is coincidental). 


Handy tip: To factorize a polynomial of degree 3 or higher, start by looking for 
integer factors of ag. This will give you one eigenvalue; then you can reduce the degree by 


polynomial long division. 


Example 7.1.5. Suppose p(A) = 8 + 24 — 5A? + A%. The factors of 8 are +1, +2,+4, +8, 


so check these by evaluation: 


pl) =8+2-54+1=640 


p(—1) =8-2-5-1=0, 
so X = —1 is a root. Then divide out (A + 1) to get 
pA) 2 
—— = *° —- 6A +8. 
AeA a 


Now it is easy to see that the other eigenvalues are \ = 2 and A = 4. 


Refresher on polynomial long division: x? —62+8 
x +1) x? — 52? + 27+ 8 
ee 

— 6x7 + 2x 

6x? + 6x 
8x +8 
—8x-8 
0 


Theorem 7.1.6. Suppose you have a polynomial 


p(A) = a9 +ayA+ Agr? +++ tn ATE EA". 
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Then the product of the roots is (—1)"ao. Furthermore, if the coefficients a; are integers, 


then the roots must also be integers. 


HW] 87.1: #6abd, 7, 8ac, 14, 15 (&16), 18, 21, 25, 27, 28 


Also: prove that if 0 is an eigenvalue of A, then Eo = null(A). Conversely, if null(A) 4 


{0}, prove that 0 is an eigenvalue of A. 


7.1.2 Complex and degenerate eigenvalues 


Example 7.1.7. Suppose L is given by L(a) = Av where 


cose —siny 


sin yp cos 


Find the eigenvalues and vectors of L. 
Geometric interpretation of this example: the given transformation has the effect of 
rotating everything in R? counterclockwise by oe 
Every vector in R? changes direction under this transformation. 
Therefore, this transformation has no eigenvalues. Or does it? 
Since 
A — cosy sin yp 


det(AI — A) = =? —2\cosy+1 
—sing A-—cosy 


so the eigenvalues are 4 = cos y + \/cos? y — 1. 


If a polynomial with real coefficients has complex roots, these roots will always 


appear as complex conjugate pairs. In other words, if a+ 1b is a root, then a — ib 


will also be a root. Note: (A — (a +ib))(A — (a — ib)) = \? — 2ad + (a? + 07). 
To make the computations less intense, we’ll find the eigenvectors for the case when 


yp = §, in which case the eigenvalues are 


A= cosy + cos? yp —1=04V0—-1=+V-1= ti. 


For =i, solve the homogeneous system (AI — A)a = 0: 


1—0 1 1 1 1 1 . 1 
~N Nw — 1%, + Xo 0 x 
—-1 1-0 —1 -1 0 0 -1 
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Notice that 
(D=CHM=CN(-1=1 0 2=-1=(-11=—)%. 


So another valid choice for the eigenvector would be 
-ir = = = 
1 . 


We saw already that if x is an eigenvector, then so is cx, where c is any scalar. 


— cis now allowed to be a complex scalar. 


For \ = —i, solve the homogeneous system (AI — A)x = 0: 


—i-0O 1 1 —l i —l . 
oS ~ => 1%7,-%72=0 x 
—l1 -i-0 —i 1 0 O 


Check that these are eigenvectors: 


0 -1 1 O+1 1 1 
A=i a = = (i) 
1 O -1 1+0 a -1 
0 -l 1 O-1 —i1 1 
A=-1 = _ = (-1) 
1 O i 1+0 1 i 


Geometric interpretation of this example: the given matrix has the effect of rotating 
everything in R? counterclockwise by os 

Every vector in R? changes direction under this transformation. 

Multiplication by a complex number of norm 1 corresponds to rotation; see the related 
extra credit assignment. (In general, multiplying by a complex number involves both 


rotation and scaling.) 


Example 7.1.8. Suppose L({4]) = [4] and L({9]) =[*]. Find the eigenvalues and vectors 
of L. 


This transformation is a shear: the x-axis is fixed and the vector on the y-axis at 
height a gets shifted horizontally ka to the right. What lines in R? are invariant under 


this transformation? 
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In the standard basis, L(x) = Ax for 


so find eigenvalues via 


det(AI — A) = = -2\4+1-0=(A-1)? 


so the eigenvalues are \ = 1,1. 
Now, to find the eigenvectors. 


For \ = 1, solve the homogeneous system (AI — A)a = 0: 


1-1 -—-k 0 —k 0 1 t 
= ~ 22 0 x ’ 
0 1-1 0 O 0 0 0 
and 
1 
Ez = span 


Now the multiplicity of AX = 2 was 2, but dim Fz = 1. What happened? 
(The geometric multiplicity of X is defined to be dim E), so this is sometimes referred to as 


the situation where the geometric multiplicity of X is less than the algebraic multiplicity.) 


A is “defective” in some sense: it has fewer eigenvectors than it should. 


Definition 7.1.9. Suppose Ax = Ax. Define 7 to be a degenerate eigenvector or generalized 


eigenvector for A iff (AI — A)n =a. 


Now if 7 is a generalized eigenvector, 
(AI = A)?n = (AI = A)(AT — A)n = (AT — A) = 0, 


since x was an eigenvector. Note that 7 is not an eigenvector. If it were, then you’d have 


(AI — A)n = 0, but (AI — A)n = x £0 (0 is never an eigenvector). 
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For generalized eigenvectors of \ = 2, solve the nonhomogeneous system (AI — A)n = a: 


0 -k|1 t 0 1 
0 0 |0 = = 0 


From what we’ve seen before, any solution of (AI — .A)n = x can be written as 7 = m+n 
with np, € rowspace(AI — A) and np, € Eo. 


Check the eigenvector: 


is shifted by 1 in the direction 


o 
ale 


Thus, when multiplied by A, every vector of the form E 
parallel to x = [4]. 
In other words, even though it is not a subspace of R?, the line 


+t :tER 


is invariant under A. 


HW] 87.1: #6c, 9, 10, 11, 12, 22, 23, 29, 30b 


For #23, it may help to look at 6c, which is an example of this phenomenon. 


Also: find the eigendata for the 2 x 2 matrix [k,0;0,k]. Can you give an example of a 


vector in R? which is not an eigenvector of this matrix? 
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7.2 Diagonalization and similar matrices 


We’ve seen that if u= >> a;v; is a linear combination of eigenvectors of L: V > V, then 


L(u)=L o2 aivi) a S- a; L(v;) = So ai dari. (7.2.1) 


Suppose that dim V = n, and that there are n linearly independent eigenvectors. 


In this case, the eigenvectors of L form a basis for V, and (7.2.1) can be 


expressed as multiplication by a diagonal matrix. 


This is a very desirable situation, because diagonal matrices are so easy to work with. 
Let S denote this basis of eigenvectors, and let D be the diagonal matrix containing 


the eigenvalues of L, so we have 


AL ay 
D= te and [uls = : : 


An Gn 


where u is expressed in S-coordinates. Now (7.2.1) becomes 


art AL ay 
L(u) = s ajyAV; = : = me : = Dluls. 
lin AK Xn 


Q 


n 


Definition 7.2.1. Let L: V > V bea linear transformation. LD is diagonalizable iff there 
exists a basis in which L(x) = Dz for a diagonal matrix D. The matrix A is diagonalizable 


iff it can be written A = PDP~! for a diagonal matrix D. 


To diagonalize L(x) = Az: 


1. Find the characteristic polynomial p(A) = det(AI — A). 
2. Find the eigenvalues, i.e., the roots of p(A). 
3. For each eigenvalue A, find a basis for the eigenspace F). 


e If dim Ey = mult(A) for each \, then A is diagonalizable. 


e If dim Ey < mult(A) for some 4, then A is not diagonalizable. 


4. If A is diagonalizable, write D = diag(A1,...,An) and P = [v1 ... Up}. 
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Here, the vy; are the columns of P, and they must appear in the same order as the Xx. 


Note: this method assumes you already have A, i.e., you have already picked a basis). 


Example 7.2.2. Let L : R® > R® be defined by 


U1 2uy — U3 
L U2 == U1, + U2 — UZ 
U3 U3 


With respect to the standard basis, this operator is given by matrix multiplication by 


2 0 -1 
A=|1 1 -1 
00 1 
However, consider the ordered basis 
1 0 1 
S= (a ea ee ee ea 
1 0 0 


101 
P=/]0 11], 
1 0 0 


0 0 1 20 -1 1041 1 0 
D=P"'AP=]-11 1 1d --1 011/= 1 
1 0 -1 00 1 1 0 0 0 0 2 


Definition 7.2.3. If A and B are n x n matrices, they are called similar iff there is a 


nonsingular matrix P such that B = P~!AP. This is written A ~ B. 
The following results are the two key facts about similar matrices. 


Theorem 7.2.4. A ~ B if and only if A and B represent the same linear transformation 


L:V—7V. (If AFB, then there are two different bases involved.) 
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Sketch of proof. Since P is invertible, it can be understood as a transition matrix between 


two bases, with a bit of computation. Conversely, if there are two bases given, then take P 


to be the transition matrix and A ~ B will follow. 


Corollary 7.2.5. Similar matrices have the same eigenvalues. 


Proof. Since eigenvalues are independent of any choice of basis for V, this follows from the 


above theorem. 


One can also prove the corollary with a calculation that is useful in its own right. 
Suppose that A ~ B and we denote the corresponding characteristic polynomials by p4(A) 
and pp(A). Then: 


pB(A) = det(Al — B) 
= det(AI — P~' AP) 
= det (P~'\IP — P~'AP) 
= det (P~"(AI — A)P) 
= det (P~*) det(AI — A) (det P) 
= det(AI — A) 


= pa(A) 


For finite-dimensional vector spaces, this discussion implies: 


Theorem 7.2.6. A linear transformation L: V > V is diagonalizable if and only 
if the eigenvectors of L form a basis of V. Moreover, if D is the matriz representing 


L in this basis, then the entries on the diagonal of D are the eigenvalues of L. 


In terms of matrices, this means: 


Theorem 7.2.7. Ann xn matrix A is similar to a diagonal matrix D if and only if A 
has n linearly independent eigenvectors. In this case, the entries on the diagonal of D are 


the eigenvalues of A. 


There is also a useful result that tells us when we are in this desirable situation: 


Theorem 7.2.8. If \i,...,A% are eigenvalues of A that are all different (i.e., all have 
multiplicity 1), then the corresponding eigenvectors {v,,...,uz} are linearly independent. 


In particular, if all the eigenvalues of A are distinct, then A is diagonalizable. 
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Example 7.2.9. It is easy to tell if triangular matrices are diagonalizable: 


2 0 0 0 

1 4 0 0 
A = 

—2 2 5 0 

1 -3 8 1 


Note that the theorem gives a sufficient but not a necessary condition! A matrix can 
be diagonalizable without having distinct eigenvalues, just like an animal can have feathers 


without being a duck! 


Example 7.2.10. Earlier, we looked at the diagonalization 


20 -1 1041 1 0 0 0 0 1 
11-1 |=A=PDP"'=!/01 1 0 1 0 -1 1 1 
00 1 1 0 0 00 2 1 0 -1 


From Theorem 7.2.6, we can infer that the eigenvalues of A are \ = 2,1,1. 


Can you also see this just from looking at A? 


HW] §7.2: #6, 8, 10, 16, 17abc, 22, 24 


I recommend doing #24 after doing (2) below: 


(1) If A is similar to B, write A ~ B. Show that similarity is an equivalence relation, 
i.e., that it satisfies the following properties: 
(a) AYA. 
(b) If Ax B, then B~ A. 
(c) IfA~ Band BYC, thn AYC. 


(2) Suppose that A ~ B. In this case, prove the following: 


If you do the Dynamical Systems extra credit, you will encounter a matrix in Problem 
4 which superficially resembles the matrix in #416a (above) very closely. However, when 


you work through these problems, you’ll notice that they behave very differently. 
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7.3 Diagonalization of symmetric matrices 


Given an x n matrix A, we consider the problem of diagonalizing A: | A= PDP~'|. 


It turns out that the case of symmetric matrices A = A” is especially important. 


Lemma 7.3.1. Let A be a symmetric matrix with eigenvalues {r;,}~2_, and corresponding 


eigenvectors {vz }R_,. Then 
(i) If\j AAn, then v; L vp. 
(ii) If X; has multiplicity m, then A; has m linearly independent eigenvectors. 


Proof. For (i), we use the key fact about symmetric matrices: you can move them around 


inside an inner product. 


Aj (07; Un) = (Azus. Ve) = (A0z; 0g) = (jv AUER) = Cp AnVE) = AgtUGs UE) 


Since A; # Ax, the only way this can be true is if (vj, vz) = 0. 


The proof of (i) would take a couple of weeks, so we’ll skip that one, too. 


Symmetric matrices are important because symmetric operators are important. 


Definition 7.3.2. L:V — V is a symmetric linear operator iff (L(x), y) = (a, L(y)) for 
any 7,yEV. 


It is easy to see that L is symmetric iff its representation L(x) = Ax in any basis is 


given by a symmetric matrix: 
(Az, y) = (L(x), y) = (a, L(y)) = (@, Ay) =(ATa,y) => A= AT. 


Theorem 7.3.3 (Spectral theorem for finite-dimensional vector spaces). Let dimV =n 
and let L: V + V be a symmetric linear operator ((L(x),y)) = (x, L(y))). Then V has a 


ONB consisting of eigenvectors of L, and all the eigenvalues of L are real. 


Proof. We'll skip the proof that A € R, as this could take a couple of weeks. The part 
about the ONB follows from the lemma: for each eigenspace EF, you can find an ONB by 
Gram-Schmidt. By part (i) of the lemma, all the eigenspaces are orthogonal, and part (ii) 


of the lemma, collecting these bases gives you a collection of n vectors. Since they are all 


orthogonal, you have an ONB. 
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In the diagonalization A = PDP! of A, this means that P has the form 


P=] vy Un 
| | 
and as we’ve seen in the HW, 
—f- ] [| | i 
oe Vp... Un | = - = Pea 
—ul | | 1 


Definition 7.3.4. A square matrix A is orthogonal iff AT = A7!, ie., iff ATA = IJ. 
Theorem 7.3.5. A is orthogonal if and only if the columns/rows of A form an ONB. 


Theorem 7.3.6. If L : V + V is defined by L(x) = Ax and A is orthogonal, then L 


preserves all angles and distances. 


Proof. For any x,y € V, 


(L(x), L(y)) = (Ax, Ay) def of L 
= (x, AT Ay) (Bu, v) = (u, B7 v) 
= (x, A~* Ay) A is orthogonal 
= (x,y). 


Thus, the angle between x and y is the same, before and after applying L. This also shows 


that ||L(x)||? = (L(x), L(a)) = (x,x) = |lz||?, so that the lengths of the vectors don’t 


change either. 


As a consequence, the geometry of an orthogonal transformation is very simple: it 


corresponds to a rigid rotation and/or reflection. 


Definition 7.3.7. An isometry is a linear transformation for which ||Z(a)|] = ||a|| for all 


rev. 


In terms of the Basis Lemma, L is an isometry if and only if LD gives a bijection between 


two ONBs. (Since both are orthogonal, all angles are preserved.) 
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424 
Example 7.3.8. Diagonalize A= | 2 1 2 


4 2 4 
Eigenvalues: 
A-4 -2 —4 
det(AI — A) = B> 1st. 9 | =r? =9)? = 710-9) 
4 2 A-A4 
So the eigenvalues are \ = 9,0, 0. 
Eigenvectors: Ai = 9. 
5 —-2 -4 1 0 -1 
9T-A=]-2 8 -2|/~]0 1 = 
-4 -2 5 0 0 O 
2 
So 21 = 73 and 4 = $3 gives eigenvector v1 = | 1 
2 
Ag = A43 = 0 
| -4 -2 -4 ] | 1 4 1 ] 
0l-A= 2 i. 2/~ {0 0 0 
-4 -2 —-4 0 0 0 
So x, + 4x9 +23 =0, or 21 = —429 — x3, which gives 
SSS ar —t -1 -1 
8 = 8 + 0 = v2 = 2 ,U3 = 0 
t 0 t 0 1 


It is clear that vg 1 v1 and v3 L vy but vo / v3. However, this just results from laziness 
with our choice of s = 2 and t = 1 to make the v;s have integer entries. We could instead 


choose an orthogonal basis of Eo by finding an element avg + bug € Eo which is orthogonal 
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to Va: 
—a—b 
(v2, avg + bus) = | 1 2 0 | 2a =a+b+4a=5a+b, 
b 

so choose b = —5a to get something orthogonal to v2, so let a = —1 and b =5: 

—-1 -1 —4 

d3=(-1)| 2 | +5] 0 | =|] -2 

0 1 5 


This new v3 is orthogonal to both v1 and v2, so normalizing gives an ONB. 


The key points of this example are that for this symmetric matrix A: 
(i) the eigenvalues of multiplicity k have k independent eigenvectors, and 


(ii) we were able to find an ONB for R® consisting of eigenvectors of A. 


HW] 87.3: #3, 4, 8, 9, 16, 17, 27, 37, 38 


For #9, describe the transformation associated to each of these matrices in geometric 


terms. 
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7.3.1 Epilogue: an application of eigenvectors to calculus that 


you’ve probably already seen (but maybe didn’t know it) 


In multivariable calculus, when searching for the extrema of f : R” — R, a R-valued 
function of n variables, you first find critical points of f by looking for 7 = (2,...,%p) 
where V f(x) = 0. In single-variable calculus, this is called the 1* derivative test. Once 
you have located a critical point x, the next step is to determine whether x is a minimum 


or a maximum of f (or neither). For this, you use the 2"¢ derivative test: 


e If f”"(%) <0 then f has a local maximum at wx. 


e If f”’(#) > 0 then f has a local minimum at x. 


e If f”’(x) =0, the second derivative test says nothing about the point x, except that 
Y g 


it may be an inflection point. 


The second partial derivatives of a function correspond to curvature (often referred to 
as “convexity” or “concavity” in single-variable calculus). To check curvature in higher 
dimensions (i.e., for a function of more than one variable), you need to consider all the 
second-order partials, so they are collected into a matrix. The matrix of partials is called 


the Hessian of f: 


orf orf oO? 
Ox? Ox, Ox2 Ox, OLn 
orf orf oe 
Ox2 Ox, 0x2 0x2 OLn 
ApS | ; 
SOEs, piOrPee. gus, (Oat 
Orn Oz, Oy Ox Ox? 


The Hessian is also the coefficient of the quadratic term in the Taylor expansion of a 


multivariable function near xz € R”: 
1 
fa+y) © f(2)+VE(@)"y+ sy" H(a2)y, for y 0. 
For functions f : R? — R, this simplifies to 


af a 
aur 0x1 ce feyay fry a2 
A(f)= 


1 
O° f O? f 
Ox2 Ox1 0x2 frgay faoxe 
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and you are taught the 2"¢ partial derivative test: if Vf (a,b) = 0 and 


M := det (A(f)) _ Jaieiseces = ere 


(note that M is a function of x; and x2), then 
(i) If M(a,b) > 0 and fz,2,(a,b) > 0 then f has a local minimum at (a, b). 
(ii) If M(a,b) > 0 and f,2, (a,b) > 0 then f has a local maximum at (a,b). 
(iii) If M(a,b) <0 then f has a saddle point at (a, 6). 


(iv) If M(a,b) = 0, the second derivative test says nothing about (a,b), except that it 


may be an inflection point. 


What’s really going on here? The eigenvectors of the Hessian are the directions of principal 
curvature; intuitively, these are the directions in which the graph of the function is curving 
most rapidly and least rapidly (or most rapidly in the negative direction). 

Key point: Since mixed partials are equal, the Hessian is a symmetric matrix, 
and so we know these eigenvectors will be orthogonal (or can be taken to be 
orthogonal, in the case of a repeated eigenvalue). This has a physical consequence: the 


directions of principal curvature are orthogonal. 


e If the eigenvalues are both positive, then you are in case (i) above, and f has a local 


minimum at (a, b). 


e If the eigenvalues are both negative, then you are in case (ii) above, and f has a 


local maximum at (a,b). 


e If the eigenvalues have different signs \; > 0 > Ag, then you are in case (iii) above 


and there is a saddle point at (a, b). 


The formulation in terms of M above is a trick that calculus textbooks give you to test 
the eigenvalues of the Hessian without knowing what eigenvalues are. For R” with n > 2, 
you look at the determinants of the upper left k x k submatrices, for k = 1,...,n. If they 
are all positive, then your eigenvalues will be all positive and you have a minimum; if they 
alternate signs, then your eigenvalues will be all negative and you have a maximum. 
Additional upshot: Since the eigendata are independent of basis, the 2"¢ partial 


derivative test works for any choice of coordinate system. 


